ARTICLE - Why the MS Office file formats is so complicated

Alex Hewitt hewitt_tech at comcast.net
Wed Feb 20 15:58:53 EST 2008


On Wed, 2008-02-20 at 13:18 -0500, Ben Scott wrote:
> On Wed, Feb 20, 2008 at 11:52 AM, Michael ODonnell
> <michael.odonnell at comcast.net> wrote:
> >  Quite the tangled mess and very hard to write compliant FOSS
> >  apps against, but (at least on the surface) apparently not
> >  the result of an actively evil intent.
> 
>   A-yup.  Lots of people (me included) have been saying that for
> years.  It really comes down to Hanlon's Razor: "Never attribute to
> malice that which can be adequately explained by stupidity".  And
> let's face it, Microsoft has plenty enough stupidity to go around.
> 
>   In many ways, Microsoft suffers from the result as much as others.
> Can you imagine what having to work with the Windows or Office source
> code must be like?  Code going back decades, much of it poorly
> documented, coding practices evolving with time and marketing fads,
> early stuff written by people who clearly had no clue about how to
> design proper systems... it's a wonder it works at all.  (One could
> argue it doesn't.)  One of the original goals in Vista was to replace
> the legacy code still doing important stuff.  After struggling for two
> years, they *gave up*.  Microsoft's can afford more resources that
> just about any software development effort, and they still couldn't
> figure it out.

A friend of ours wrote a bunch of recipe files using something called
Microsoft Write. Files created with that tool have a .wri extension.
Theoretically Microsoft Word is supposed to be able to read such files
but I found that the version I was using (Word 2003) wouldn't. So I
opened a few of the files with a binary editor and found that every file
had an 84 hex byte prefix, the file itself in ASCII, a series of bytes
again in non-ASCII, followed by a repeat of some of the original ASCII.
Writing a filter in Python was trivial and I was  able to convert the
files to plain text. Of course some of the lines were no run-on but
overall the cleanup was simple. But the interesting thing was that I
couldn't easily find a Microsoft tool that understood the format which
originated with Windows 95 or an earlier version of Windows. Along the
way Microsoft had basically given up on the format. I'm sure somewhere
there is a tool that can read those files short of the original platform
but we're only talking about perhaps a ten year span since the files
were created and now are not readily readable.

-Alex

> 
>   Of course, many people still put their critical data in that mess.
> Now *that's* scary.  <gulp>
> 
> -- Ben
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/



More information about the gnhlug-discuss mailing list