Converting HTML and MIME to plain text mail

Roger H. Goun roger at bcah.com
Tue Oct 7 17:29:49 EDT 2008


2008/10/7 Ben Scott <dragonhawk at gmail.com>:
> On Tue, Oct 7, 2008 at 3:10 PM, Roger H. Goun <roger at bcah.com> wrote:
>>> * Decode BASE64 or quoted-printable to 7-bit clean plain text
>>
>> This should be decode to 8-bit clean plain text.
>
>  Nope.  Not if you're talking strict RFC-821/822 compliance.  The
> specs say ASCII.  ASCII is properly a 7-bit character code.  The RFC
> reinforces this, going so far as to give acceptable character code
> values as 1 through 127.  RFC-822, Section 3.3.
>
>        http://tools.ietf.org/html/rfc822#section-3.3
>
>  After all, if someone really wants that genuine 1982 email
> experience, I would hate for them to be disappointed.  ;-)

No. In the general case, you have been handed a BASE64 or
quoted-printable encoding of random 8-bit data. (If it wasn't 8-bit
data, there was no reason for the sender or an intermediary to encode
it.) The first step is to remove the encoding, leaving unencoded 8-bit
data, not 7-bit data. The result of this step is similar to what
happens if you configure a modern MTA not to bother encoding 8-bit
data when it is known that the communications channel is 8-bit clean.
After the NEXT step (replace non-ASCII characters), the result will be
7-bit, as you desire, but for now it's still 8-bit.

>> You probably want the mail to remain a valid MIME message, just in
>> case the user ever upgrades her MUA.
>
>  Leaving the MIME headers when the explict goal is to remove all MIME
> functionality seems like a waste to me.  They can never be used for
> anything useful.  I would think it better to make it clear that this
> isn't MIME.

I'm changing the requirements. :-) The goal should be to produce
RFC-compliant email (or as close to that as possible) that the ancient
MUA can handle. The result of the conversion process is perfectly
valid MIME -- otherwise I'd never suggest keeping the minimal MIME
headers. Unless the target MUA is so vile that it falls over if it
sees:

MIME-Version: 1.0

in the headers, you should conform to RFC-compliant modern usage.
That's just an application of the "be conservative in what you
produce" philosophy.

> But hey, I can't volunteering to write the code.  ;-)

So then why are we having this debate? :-)

One other requirement I forgot earlier: you need to handle RFC
2047-compliant non-ASCII email headers as well.

-- Roger


More information about the gnhlug-discuss mailing list