Converting HTML and MIME to plain text mail
Roger H. Goun
roger at bcah.com
Tue Oct 7 17:29:49 EDT 2008
2008/10/7 Ben Scott <dragonhawk at gmail.com>:
> On Tue, Oct 7, 2008 at 3:10 PM, Roger H. Goun <roger at bcah.com> wrote:
>>> * Decode BASE64 or quoted-printable to 7-bit clean plain text
>>
>> This should be decode to 8-bit clean plain text.
>
> Nope. Not if you're talking strict RFC-821/822 compliance. The
> specs say ASCII. ASCII is properly a 7-bit character code. The RFC
> reinforces this, going so far as to give acceptable character code
> values as 1 through 127. RFC-822, Section 3.3.
>
> http://tools.ietf.org/html/rfc822#section-3.3
>
> After all, if someone really wants that genuine 1982 email
> experience, I would hate for them to be disappointed. ;-)
No. In the general case, you have been handed a BASE64 or
quoted-printable encoding of random 8-bit data. (If it wasn't 8-bit
data, there was no reason for the sender or an intermediary to encode
it.) The first step is to remove the encoding, leaving unencoded 8-bit
data, not 7-bit data. The result of this step is similar to what
happens if you configure a modern MTA not to bother encoding 8-bit
data when it is known that the communications channel is 8-bit clean.
After the NEXT step (replace non-ASCII characters), the result will be
7-bit, as you desire, but for now it's still 8-bit.
>> You probably want the mail to remain a valid MIME message, just in
>> case the user ever upgrades her MUA.
>
> Leaving the MIME headers when the explict goal is to remove all MIME
> functionality seems like a waste to me. They can never be used for
> anything useful. I would think it better to make it clear that this
> isn't MIME.
I'm changing the requirements. :-) The goal should be to produce
RFC-compliant email (or as close to that as possible) that the ancient
MUA can handle. The result of the conversion process is perfectly
valid MIME -- otherwise I'd never suggest keeping the minimal MIME
headers. Unless the target MUA is so vile that it falls over if it
sees:
MIME-Version: 1.0
in the headers, you should conform to RFC-compliant modern usage.
That's just an application of the "be conservative in what you
produce" philosophy.
> But hey, I can't volunteering to write the code. ;-)
So then why are we having this debate? :-)
One other requirement I forgot earlier: you need to handle RFC
2047-compliant non-ASCII email headers as well.
-- Roger
More information about the gnhlug-discuss
mailing list