Using xmlstarlet and OpenOffice

G Rundlett greg.rundlett at gmail.com
Tue Mar 2 23:36:40 EST 2010


On Sun, Feb 28, 2010 at 6:03 PM, Bruce Dawson <jbd at codemeta.com> wrote:

> Has anyone used xmlstarlet (a command-line xml parser) to get data from
> content.xml (OpenOffice) files?
>
>
I had not heard of it before, so thanks for pointing it out.  (Note to
general readers: on my Ubuntu system I had to create a symbolic link 'xml'
to /usr/bin/xmlstarlet to use the 'xml' command referenced in the
documentation.)


> It seems to be complaining about Undefined namespace prefix, and I can't
> seem to figure out what it wants.
>

I was able to get results by specifying more/all of the namespaces used in
the document in question.

For example:

xml select -N :1.0' -N
table='urn:oasis:names:tc:opendocument:xmlns:table:1.0' -N
draw='urn:oasis:names:tc:opendocument:xmlns:drawing:1.0' -N
fo='urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0' -N xlink='
http://www.w3.org/1999/xlink' -N dc='http://purl.org/dc/elements/1.1/' -N
meta='urn:oasis:names:tc:opendocument:xmlns:meta:1.0' -N
number='urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0' -N
svg='urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0' -N
chart='urn:oasis:names:tc:opendocument:xmlns:chart:1.0' -N
dr3d='urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0' -N
form='urn:oasis:names:tc:opendocument:xmlns:form:1.0' -N
script='urn:oasis:names:tc:opendocument:xmlns:script:1.0' -N ooo='
http://openoffice.org/2004/office' -N ooow='
http://openoffice.org/2004/writer' -N oooc='http://openoffice.org/2004/calc'
-N dom='http://www.w3.org/2001/xml-events' -N xforms='
http://www.w3.org/2002/xforms' -N xsd='http://www.w3.org/2001/XMLSchema' -N
xsi='http://www.w3.org/2001/XMLSchema-instance' -N rpt='
http://openoffice.org/2005/report' -N
of='urn:oasis:names:tc:opendocument:xmlns:of:1.2' -N rdfa='
http://docs.oasis-open.org/opendocument/meta/rdfa#' -N
field='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0'
-N
formx='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0'
--text --template --value-of office:document-content content.xml

Or,

xmlstarlet select -T -N
office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" -N xlink="
http://www.w3.org/1999/xlink" -N dc="http://purl.org/dc/elements/1.1/" -N
meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" -N ooo="
http://openoffice.org/2004/office" -t -v
office:document-meta/office:meta/meta:generator meta.xml

I found it was easiest to identify all the namespaces used in the document
by using  the "el" and "ed" commands:
i.e.
xml elements -v content.xml
or
xml edit -v content.xml

The online reference in PDF was helpful as a cheatsheet
http://xmlstar.sourceforge.net/doc/xmlstarlet.pdf

~ Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.gnhlug.org/mailman/private/gnhlug-discuss/attachments/20100302/bda1ac76/attachment.html 


More information about the gnhlug-discuss mailing list