Using xmlstarlet and OpenOffice

Bruce Dawson jbd at codemeta.com
Wed Mar 3 12:21:15 EST 2010


[WARNING: Lots of code here...]

Gregg: Thanks for following up on this. I too thought xmlstarlet would
be an excellent tool, but I'm having problems running it.

I'm attempting to print out an input field using the script:

    #!/bin/bash
    # $Header$

    # Fetch the value of a field from an OO document

    xmlstarlet select --net -T \
      -N office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" \
      -N style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" \
      -N text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" \
      -N table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" \
      -N draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" \
      -N fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" \
      -N xlink="http://www.w3.org/1999/xlink" \
      -N dc="http://purl.org/dc/elements/1.1/" \
      -N meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" \
      -N number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" \
      -N svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" \
      -N chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" \
      -N dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" \
      -N math="http://www.w3.org/1998/Math/MathML" \
      -N form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" \
      -N script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" \
      -N ooo="http://openoffice.org/2004/office" \
      -N ooow="http://openoffice.org/2004/writer" \
      -N oooc="http://openoffice.org/2004/calc" \
      -N dom="http://www.w3.org/2001/xml-events" \
      -N xforms="http://www.w3.org/2002/xforms" \
      -N xsd="http://www.w3.org/2001/XMLSchema" \
      -N xsi="http://www.w3.org/2001/XMLSchema-instance" \
      -N
    field="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0"
    \
     -t -v
    office:document-content/office:body/office:text/text:p/text:span/text:text-input
    content.xml

But this only produces:

    XPath error : Undefined namespace prefix
    xmlXPathCompiledEval: evaluation failed
    runtime error: element value-of
    XPath evaluation returned no result.

Note that before running that command, I unzipped 
http://www.brucedawson.com/files/SomeFarmRentalAgreement.odt into the
current directory. (That file is test data, feel free to look at it.)

The most frustrating part is that I don't know which namespace prefix it
is having problems with. And I'm including all the namespaces mentioned
in the content.xml!

Of course, the script will eventually be extended to extract a
particular element (like RenterDriverName, or AgreementDate, or ...),
but I need to get it to successfully parse the whole thing first.

--Bruce

G Rundlett wrote:
> On Sun, Feb 28, 2010 at 6:03 PM, Bruce Dawson <jbd at codemeta.com
> <mailto:jbd at codemeta.com>> wrote:
>
>     Has anyone used xmlstarlet (a command-line xml parser) to get data
>     from
>     content.xml (OpenOffice) files?
>
>
> I had not heard of it before, so thanks for pointing it out.  (Note to
> general readers: on my Ubuntu system I had to create a symbolic link
> 'xml' to /usr/bin/xmlstarlet to use the 'xml' command referenced in
> the documentation.)
>  
>
>     It seems to be complaining about Undefined namespace prefix, and I
>     can't
>     seem to figure out what it wants.
>
>
> I was able to get results by specifying more/all of the namespaces
> used in the document in question.
>
> For example:
>
> xml select -N :1.0' -N
> table='urn:oasis:names:tc:opendocument:xmlns:table:1.0' -N
> draw='urn:oasis:names:tc:opendocument:xmlns:drawing:1.0' -N
> fo='urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0' -N
> xlink='http://www.w3.org/1999/xlink' -N
> dc='http://purl.org/dc/elements/1.1/' -N
> meta='urn:oasis:names:tc:opendocument:xmlns:meta:1.0' -N
> number='urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0' -N
> svg='urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0' -N
> chart='urn:oasis:names:tc:opendocument:xmlns:chart:1.0' -N
> dr3d='urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0' -N
> form='urn:oasis:names:tc:opendocument:xmlns:form:1.0' -N
> script='urn:oasis:names:tc:opendocument:xmlns:script:1.0' -N
> ooo='http://openoffice.org/2004/office' -N
> ooow='http://openoffice.org/2004/writer' -N
> oooc='http://openoffice.org/2004/calc' -N
> dom='http://www.w3.org/2001/xml-events' -N
> xforms='http://www.w3.org/2002/xforms' -N
> xsd='http://www.w3.org/2001/XMLSchema' -N
> xsi='http://www.w3.org/2001/XMLSchema-instance' -N
> rpt='http://openoffice.org/2005/report' -N
> of='urn:oasis:names:tc:opendocument:xmlns:of:1.2' -N
> rdfa='http://docs.oasis-open.org/opendocument/meta/rdfa#' -N
> field='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:field:1.0'
> -N
> formx='urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0'
> --text --template --value-of office:document-content content.xml
>
> Or,
>
> xmlstarlet select -T -N
> office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" -N
> xlink="http://www.w3.org/1999/xlink" -N
> dc="http://purl.org/dc/elements/1.1/" -N
> meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" -N
> ooo="http://openoffice.org/2004/office" -t -v
> office:document-meta/office:meta/meta:generator meta.xml
>
> I found it was easiest to identify all the namespaces used in the
> document by using  the "el" and "ed" commands:
> i.e.
> xml elements -v content.xml
> or
> xml edit -v content.xml
>
> The online reference in PDF was helpful as a cheatsheet
> http://xmlstar.sourceforge.net/doc/xmlstarlet.pdf
>
> ~ Greg
>
>



More information about the gnhlug-discuss mailing list