Using xmlstarlet and OpenOffice
Bruce Dawson
jbd at codemeta.com
Wed Mar 3 12:21:15 EST 2010
[WARNING: Lots of code here...]
Gregg: Thanks for following up on this. I too thought xmlstarlet would
be an excellent tool, but I'm having problems running it.
I'm attempting to print out an input field using the script:
#!/bin/bash
# $Header$
# Fetch the value of a field from an OO document
xmlstarlet select --net -T \
-N office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" \
-N style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" \
-N text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" \
-N table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" \
-N draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" \
-N fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" \
-N xlink="http://www.w3.org/1999/xlink" \
-N dc="http://purl.org/dc/elements/1.1/" \
-N meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" \
-N number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" \
-N svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" \
-N chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" \
-N dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" \
-N math="http://www.w3.org/1998/Math/MathML" \
-N form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" \
-N script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" \
-N ooo="http://openoffice.org/2004/office" \
-N ooow="http://openoffice.org/2004/writer" \
-N oooc="http://openoffice.org/2004/calc" \
-N dom="http://www.w3.org/2001/xml-events" \
-N xforms="http://www.w3.org/2002/xforms" \
-N xsd="http://www.w3.org/2001/XMLSchema" \
-N xsi="http://www.w3.org/2001/XMLSchema-instance" \
-N
field="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0"
\
-t -v
office:document-content/office:body/office:text/text:p/text:span/text:text-input
content.xml
But this only produces:
XPath error : Undefined namespace prefix
xmlXPathCompiledEval: evaluation failed
runtime error: element value-of
XPath evaluation returned no result.
Note that before running that command, I unzipped
http://www.brucedawson.com/files/SomeFarmRentalAgreement.odt into the
current directory. (That file is test data, feel free to look at it.)
The most frustrating part is that I don't know which namespace prefix it
is having problems with. And I'm including all the namespaces mentioned
in the content.xml!
Of course, the script will eventually be extended to extract a
particular element (like RenterDriverName, or AgreementDate, or ...),
but I need to get it to successfully parse the whole thing first.
--Bruce
G Rundlett wrote:
> On Sun, Feb 28, 2010 at 6:03 PM, Bruce Dawson <jbd at codemeta.com
> <mailto:jbd at codemeta.com>> wrote:
>
> Has anyone used xmlstarlet (a command-line xml parser) to get data
> from
> content.xml (OpenOffice) files?
>
>
> I had not heard of it before, so thanks for pointing it out. (Note to
> general readers: on my Ubuntu system I had to create a symbolic link
> 'xml' to /usr/bin/xmlstarlet to use the 'xml' command referenced in
> the documentation.)
>
>
> It seems to be complaining about Undefined namespace prefix, and I
> can't
> seem to figure out what it wants.
>
>
> I was able to get results by specifying more/all of the namespaces
> used in the document in question.
>
> For example:
>
> xml select -N :1.0' -N
> table='urn:oasis:names:tc:opendocument:xmlns:table:1.0' -N
> draw='urn:oasis:names:tc:opendocument:xmlns:drawing:1.0' -N
> fo='urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0' -N
> xlink='http://www.w3.org/1999/xlink' -N
> dc='http://purl.org/dc/elements/1.1/' -N
> meta='urn:oasis:names:tc:opendocument:xmlns:meta:1.0' -N
> number='urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0' -N
> svg='urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0' -N
> chart='urn:oasis:names:tc:opendocument:xmlns:chart:1.0' -N
> dr3d='urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0' -N
> form='urn:oasis:names:tc:opendocument:xmlns:form:1.0' -N
> script='urn:oasis:names:tc:opendocument:xmlns:script:1.0' -N
> ooo='http://openoffice.org/2004/office' -N
> ooow='http://openoffice.org/2004/writer' -N
> oooc='http://openoffice.org/2004/calc' -N
> dom='http://www.w3.org/2001/xml-events' -N
> xforms='http://www.w3.org/2002/xforms' -N
> xsd='http://www.w3.org/2001/XMLSchema' -N
> xsi='http://www.w3.org/2001/XMLSchema-instance' -N
> rpt='http://openoffice.org/2005/report' -N
> of='urn:oasis:names:tc:opendocument:xmlns:of:1.2' -N
> rdfa='http://docs.oasis-open.org/opendocument/meta/rdfa#' -N
> field='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0'
> -N
> formx='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0'
> --text --template --value-of office:document-content content.xml
>
> Or,
>
> xmlstarlet select -T -N
> office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" -N
> xlink="http://www.w3.org/1999/xlink" -N
> dc="http://purl.org/dc/elements/1.1/" -N
> meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" -N
> ooo="http://openoffice.org/2004/office" -t -v
> office:document-meta/office:meta/meta:generator meta.xml
>
> I found it was easiest to identify all the namespaces used in the
> document by using the "el" and "ed" commands:
> i.e.
> xml elements -v content.xml
> or
> xml edit -v content.xml
>
> The online reference in PDF was helpful as a cheatsheet
> http://xmlstar.sourceforge.net/doc/xmlstarlet.pdf
>
> ~ Greg
>
>
More information about the gnhlug-discuss
mailing list