Converting HTML and MIME to plain text mail

Ben Scott dragonhawk at gmail.com
Tue Oct 7 19:55:32 EDT 2008


On Tue, Oct 7, 2008 at 5:29 PM, Roger H. Goun <roger at bcah.com> wrote:
> No. In the general case, you have been handed a BASE64 or
> quoted-printable encoding of random 8-bit data. ... The first
> step is to remove the encoding, leaving unencoded 8-bit
> data, not 7-bit data.

  Ah, I see where our confusion comes from.  I was considering that
bullet list of "features" as an end result, the product, the final
requirement -- not a step-by-step procedure.  So at the end, this
hypothetical gateway would spit out 7-bit ASCII.  Internally, it would
need to process 8-bit data to get there, of course.  :)

> I'm changing the requirements. :-) The goal should be to produce
> RFC-compliant email (or as close to that as possible) that the ancient
> MUA can handle.

  I have found that, even if mail is reasonably readable within the
backwards compatibility features of MIME, mail Luddites often still
complain about all this "extra stuff" their mail readers display.  So
this hyptothetical gateway is doing processing not just for software,
but for people.  It's not the software that chokes on "MIME-Version:
1.0", but the operator.  ;-)

  Presumably, should such a gateway actually be implemented, it would
easy enough to make it an option: Reduce to plain text only while
preserving most other modern features, or strip everything  to get
that "true 1982 experience".

  Hmmm, this makes me think of another feature to add:

* Handle MIME file attachments

  Depending on the situation, it might make more sense to just decode
attachments and drop them as files in a directory somewhere, or append
them to the message body using UUENCODE or similar.

> So then why are we having this debate? :-)

  "Debate"?  All I did was throw out an idea about a mail gateway.  :)
 But anyway, to answer the intent of your question: Partly as a
thought exercise.  Hopefully one that proves educational about mail
systems.  (I've learned some something already; see below.)  And
partly because mail Luddites like to complain about how their mail
readers display barf instead of mail, so I am curious how easy/hard it
would be to correct for the problem at their ends (since changing the
rest of the world seems unlikely).

  Perhaps I'm making a point, too.  Sometimes it's hard to tell.  ;-)

> One other requirement I forgot earlier: you need to handle RFC
> 2047-compliant non-ASCII email headers as well.

  Ohhh, I didn't know about those.  Neat.  :)  Although, thinking
about it, it should have been evident to me that some such facility
existed.  At work, I see non-Latin characters in mail headers from our
Asian sales rep all the time.

  From an extremely cursory glance at the RFC, it looks like this is
basically quoted-printable applied to mail headers, so that should be
straight-forward.

-- Ben


More information about the gnhlug-discuss mailing list