Converting HTML and MIME to plain text mail
    Roger H. Goun 
    roger at bcah.com
       
    Tue Oct  7 17:29:49 EDT 2008
    
    
  
2008/10/7 Ben Scott <dragonhawk at gmail.com>:
> On Tue, Oct 7, 2008 at 3:10 PM, Roger H. Goun <roger at bcah.com> wrote:
>>> * Decode BASE64 or quoted-printable to 7-bit clean plain text
>>
>> This should be decode to 8-bit clean plain text.
>
>  Nope.  Not if you're talking strict RFC-821/822 compliance.  The
> specs say ASCII.  ASCII is properly a 7-bit character code.  The RFC
> reinforces this, going so far as to give acceptable character code
> values as 1 through 127.  RFC-822, Section 3.3.
>
>        http://tools.ietf.org/html/rfc822#section-3.3
>
>  After all, if someone really wants that genuine 1982 email
> experience, I would hate for them to be disappointed.  ;-)
No. In the general case, you have been handed a BASE64 or
quoted-printable encoding of random 8-bit data. (If it wasn't 8-bit
data, there was no reason for the sender or an intermediary to encode
it.) The first step is to remove the encoding, leaving unencoded 8-bit
data, not 7-bit data. The result of this step is similar to what
happens if you configure a modern MTA not to bother encoding 8-bit
data when it is known that the communications channel is 8-bit clean.
After the NEXT step (replace non-ASCII characters), the result will be
7-bit, as you desire, but for now it's still 8-bit.
>> You probably want the mail to remain a valid MIME message, just in
>> case the user ever upgrades her MUA.
>
>  Leaving the MIME headers when the explict goal is to remove all MIME
> functionality seems like a waste to me.  They can never be used for
> anything useful.  I would think it better to make it clear that this
> isn't MIME.
I'm changing the requirements. :-) The goal should be to produce
RFC-compliant email (or as close to that as possible) that the ancient
MUA can handle. The result of the conversion process is perfectly
valid MIME -- otherwise I'd never suggest keeping the minimal MIME
headers. Unless the target MUA is so vile that it falls over if it
sees:
MIME-Version: 1.0
in the headers, you should conform to RFC-compliant modern usage.
That's just an application of the "be conservative in what you
produce" philosophy.
> But hey, I can't volunteering to write the code.  ;-)
So then why are we having this debate? :-)
One other requirement I forgot earlier: you need to handle RFC
2047-compliant non-ASCII email headers as well.
-- Roger
    
    
More information about the gnhlug-discuss
mailing list