ARTICLE - Why the MS Office file formats is so complicated
Alex Hewitt
hewitt_tech at comcast.net
Wed Feb 20 18:51:03 EST 2008
On Wed, 2008-02-20 at 17:23 -0500, Ben Scott wrote:
> On Wed, Feb 20, 2008 at 3:58 PM, Alex Hewitt <hewitt_tech at comcast.net> wrote:
> > A friend of ours wrote a bunch of recipe files using something called
> > Microsoft Write.
>
> Yah, "Windows Write" is/was one of the "accessories" that came with
> Windows 3.x. It morphed into "WordPad" in Windows 95 and later.
> WordPad still exists. It won't write the Write (hah) format anymore,
> but it can read it, and save in some variant of the RTF format.
>
> > Theoretically Microsoft Word is supposed to be able to read such files
> > but I found that the version I was using (Word 2003) wouldn't.
>
> Curious. My install of Word 2003 can. Are you sure you installed
> all the import/export filters? If you did a "Minimal" or "Custom"
> install (instead of the mondo-huge "Full"), I don't think those are
> all included by default.
I sit corrected! ;^) Word 2003 complains about the file saying in effect
"this might be a virus but I have a converter that will convert it" and
it does. I think the original reason I wrote the filter was because our
friend didn't have Word and I didn't want to manually edit her 83 files.
I'll see if Word can be called from the command line to do the
converting.
-Alex
>
> > Writing a filter in Python was trivial and I was able to convert the
> > files to plain text.
>
> For future reference, the strings(1) command can be used to much the
> same effect.
>
> > ... the file itself in ASCII, a series of bytes again in non-ASCII,
> > followed by a repeat of some of the original ASCII.
>
> That sounds very similar to the MS Word .DOC format, and I bet
> they're related. DOC files do not interleave the formatting with the
> text, as (for example) HTML or Word Perfect did. Instead, all the
> plain text is stored in one blob, and then the formatting information
> is stored in a different blob. The formatting directives have
> "pointers" to the position of the text they effect.
>
> The "repeat" you describe is not actually a repeat, but a follow-on
> save. Word and friends work in an interesting fashion. You open the
> file, and it loads the base text blob described above. You start
> making your changes. Those changes go into an undo buffer. That undo
> buffer is actually backstored on the disk in temporary files.
> (That's why a directory containing Word files people are busy editing
> accumulates lots of odd temp files until they close the original.)
>
> When you invoke "Save", the undo buffer -- essentially like a "diff"
> -- gets tacked on to the end of the main file. This made saves fast
> on slow computers already overburdened by Microsoft bloatware. Loads
> were slower, of course, but the reasoning was that people care about
> save speed more than load speed. As you can imagine, if there are
> lots of saves, rebuilding the text is not so easy as running
> strings(1) on it.
>
> In Word, if you turn off "Fast Saves", it writes out a full, unified
> version of the text instead. This became the default at some point --
> I have no idea when.
>
> > But the interesting thing was that I couldn't easily find a Microsoft tool that
> > understood the format which originated with Windows 95 or an earlier version
> > of Windows.
>
> Start -> Programs -> Accessories -> WordPad
>
> My copy of Win XP Pro opens .WRI files automatically in WordPad. I
> just double-click the file.
>
> WordPad is an optional component for Windows. Perhaps the computer
> was installed with a "minimalist" attitude, so various optional tools
> were not there when you needed them?
>
> -- Ben
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
More information about the gnhlug-discuss
mailing list