ARTICLE - Why the MS Office file formats is so complicated

Shawn O'Shea shawn at eth0.net
Thu Feb 21 12:12:37 EST 2008


If you have python and the pywin32 libraries installed, a script like this
works...
Adjust the dirs at the beginning. This will convert to rtf (and strips the
last three letters of the filename and changes them to rtf).
If you want txt, change "rtf" to "txt" and 6 to 2 in the SaveAs (the numbers
come from WdSaveFormat from
http://msdn2.microsoft.com/en-us/library/aa220734.aspx [starting 0
numbered])

import win32com.client
import os

docsdir = "C:\Documents and Settings\shawn\My Documents\wordtest"
outdir = "C:\Documents and Settings\shawn\My Documents\converted"

docslist = os.listdir(docsdir)

app=win32com.client.Dispatch("Word.Application")
app.Visible=1

for file in docslist:
    print os.path.abspath(docsdir+"\\"+file)
    doc=app.Documents.Open(os.path.abspath(docsdir+"\\"+file))
    doc.SaveAs(outdir+"\\"+file.replace(file[-3:],"rtf"),6)
    doc.Close()

app.Quit()

-Shawn


On 21 Feb 2008 10:18:57 -0500, Kevin D. Clark <kevin_d_clark at comcast.net>
wrote:
>
>
> Ben Scott writes:
>
>
> > On Wed, Feb 20, 2008 at 6:53 PM, Alex Hewitt wrote:
> > > I just tried to read these files again with Word and it can read them.
> > > I'll see if there's a way to read/convert these files from a batch
> job.
> >
> >   Not a traditional batch file, I don't think, but it should be very
> > possible with either (1) VBA or (2) AutoIt.  VBA is "Visual Basic for
> > Applications", is built-in to MS Office, and it can do just about
> > anything, provided you can stomach the syntax.  Still, for something
> > like this, it should do well.  AutoIt is a third-party, freeware
> > automation scripting tool for 'doze.  It's kind of like expect(1) for
> > the Windows GUI.  It's useful for scripting that which is not designed
> > to be scripting.
>
>
> Perl's libwin32 library (Win32::OLE) might be useful here as well.
>
> Here's an example of somebody using this module to control another
> win32 application:
>
>   http://www.perl.com/pub/a/2005/04/21/win32ole.html
>
> Regards,
>
> --kevin
>
> --
> GnuPG ID: B280F24E             Don't you know there ain't no devil,
> alumni.unh.edu!kdc             there's just God when he's drunk?
>                                  -- Tom Waits
>
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.gnhlug.org/mailman/private/gnhlug-discuss/attachments/20080221/acbeeb73/attachment.html 


More information about the gnhlug-discuss mailing list