Name |
Overview |
XML::XPath |
XML::Parser |
XML::SAX |
Resources |
XML Ramble
Perl, as always, gives you various ways to do things. Processing XML is no exception.
Here I offer a few comments on XML::XPath, XML::Parser (and so XML::DOM), and XML::SAX.
Perl has a number of mini-languages in it. Take:
$x = './x +';
From there we could have:
$a = q|$x|; $b = qq|$x|; $c = qr|$x|; $d = qx|$x|;
Then there are XS, POD, the full monty of regular expressions, even here documents, etc. So it comes as no real surprise when the XML::XPath docs say:
"nodeset = find($path, [$context])
The find function takes an XPath expression (a string) and returns either an XML::XPath::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of XML::XPath::Literal (a string), XML::XPath::Number, or XML::XPath::Boolean."
Note the phrase 'XPath expression'. This is the XPath mini-language I referred to in my original posting. The return value could be construed as belonging to yet another mini-language.
As I see it, there are 2 connotations:
a) You need to know the exact structure of the XML document being parsed in order to craft meaningful XPath expressions. Of course, you normally do know the structure, and most likely you need to know it no matter which module you use. Nevertheless, this affects the generality of the code you write.
b) Your code will be procedural in structure, and you'll basically process the XML document with 'for' statements. Eg (again from the XML::XPath docs):
"my $nodeset = $xp->find('/html/body/p'); # find all paragraphs foreach my $node ($nodeset->get_nodelist) { print "FOUND\n\n", XML::XPath::XMLParser::as_string($node), "\n\n"; }"
There's nothing wrong with this method of analyzing an XML document. If it does the job - fine. I think of this method as a bit like plodding, or as a line-by-line approach (what I called 'linear' in my original posting).
For an XML document with any sort of structure, the 'for' loops are going to become more and more nested, and it's this aspect of the XPath approach which I feel uneasy about.
Also, it's the same old sort of coding I write every day. I'm learning XPath expressions sure, but I'm learning nothing new about Perl, or ways of thinking. This contrasts quite strongly with the call-back structure of code using XML::Parser and XML::SAX. Some people would say this is a good thing, in that the module (XML::XPath) is shielding me from certain complexities of the problem at hand. Hmmm.
Lastly, there is neither need nor expectation of using OO code here.
The module most of us cut our teeth on, probably, either directly or via XML::DOM.
This module is call-back (handler) based, and hence you have to think quite differently about the processing logic as compared to using XML::XPath.
Actually, I think being comfortable with call-backs is an important skill, and modules like this are a good, and well-documented, way of developing this skill.
This particular module is stable and famous. And, if it does the job - fine.
I used XML::Parser when I decided to convert my web site from HTML files to a single XML file, and needed a way of reconstructing the HTML from the XML. In going thru this I learned a lot. I became more comfortable with call-backs, and am very glad I went through this particular learning process.
But I was never happy with the structure of the code, although very pleased with it as a first attempt (ie ambivalent). I started programming as a uni student in 1970 (gasp), and I could feel in my bones that this code just wasn't quite right.
Again, there is neither need nor expectation of using OO code here.
However time marches on, and so does software technology. Enter SAX.
With this module, the call-backs are in their own module, and the latter is based on XML::SAX::Base.
I've learned a lot in the last 3 years, of course, but even so, using SAX has been a revelation. It's a vastly more natural way of processing XML.
My web site XML file was 4,017 lines, almost all double spaced, and the original Perl code was 335 lines. Obviously no big deal.
That program also contained the HTML which had a CSS embedded in it. In the new design, the CSS is in its own file.
I've now simplified the XML file slightly, so it's down to 3,678 lines. The SAX-based Perl is in 2 parts:
One instance of this handler object exists during the parse, and it accumulates all the data required to crank out HTML at the end of the parse.
I think this is an exceptionally neat way of parsing XML, and am delighted with the result.
Let me emphasise, I am not an OO fanatic. POP (Plain Old Perl) I usually fine by me. This project just happens to be one where the OO approach effortlessly leads to a superior design. And by superior, I mean clearer, neater, shorter.
This object contains no reference whatsoever to the CSS which will be used to format the HTML.
Almost all of this code is taken up with generating HTML tables out of the data accumulated in the handler object during the parse. It's here too that the CSS is invoked.
I haven't bothered to write methods which return bits and pieces of the data as required by the main program. I just use the handler object as a hash ref and so access the data directly. Ok, so the OO purists will be going tut-tut. Who cares?
Thus I create the HTML first, and can then tweak the CSS and review the result to my heart's content.
Not much of a code saving, if that was the intention. But then, it was not my intention.
Articles/OpenOffice/XML.com Adventures with OpenOffice and XML [Feb. 07, 2001]
Articles/Perl/XML/Igor's Webhome
Articles/Perl/XML/Perl SAX 2.0 Binding
Articles/Perl/XML/Simple XML Validation with Perl
Articles/Perl/XML/TPC 2001 Conference Presentations
Articles/Perl/XML/XML Encodings
Articles/Perl/XML/XML Encodings - index
Articles/Perl/XML/XML Modules - index
Articles/Perl/XML/XML and Perl
Articles/Perl/XML/XML.com Creating Scalable Vector Graphics with Perl [Jul. 11, 2001]
Articles/Perl/XML/XML.com Creating Web Utilities Using XMLXPath
Articles/Perl/XML/XML.com High-Performance XML Parsing With SAX [Feb. 14, 2001]
Articles/Perl/XML/XML.com Perl XML Quickstart Convenience Modules [Jun. 13, 2001]
Articles/Perl/XML/XML.com Perl XML Quickstart The Perl XML Interfaces [Apr. 18, 2001]
Articles/Perl/XML/XML.com Perl XML Quickstart The Standard XML Interfaces [May. 16, 2001]
Articles/Perl/XML/XML.com Pyxie [Mar. 15, 2000]
Articles/Perl/XML/XML.com Transforming XML With SAX Filters [Oct. 10, 2001]
Articles/Perl/XML/XML.com Using XSL Formatting Objects
Articles/Perl/XML/XML.com Writing SAX Drivers for Non-XML Data [Sep. 19, 2001]
Articles/Perl/XML/XML.com XMLLibXML - An XMLParser Alternative [Nov. 14, 2001]
Articles/Perl/XML/XML.com XMLParser and Character Encodings
Articles/Perl/XML/XML::Twig/Tutorial
Articles/Perl/XML/XML::Twig/Tutorial - examples
Articles/Perl/XML/XML::Twig/Using XMLTwig [Mar. 21, 2001]
Articles/Perl/XML/XML::Twig/Ways to Rome
Articles/Perl/XML/XML::Twig/Ways to Rome - index
Articles/Perl/XML developer news from XMLhack by and for the XML community
Articles/Perl/XSP/XSP & Apache
Articles/XML/XML etc tutorial The XML Revolution
Articles/XML/XMLperl X Marks (up) the Language
FAQs/XSL Frequently Asked Questions
Security/Windows 2000/New XML-Based Security Site; Hardening Windows 2000
XML/Mike J. Brown's XML and XSL stuff
XML/Orchard Source and Documentation
XML/RXP - Validating open source parser in C
XML/Take23 news and resources for the mod_perl world
XML/The XML C library for Gnome
XML/The XML Cover Pages - Home Page
XML/XML - Books and essays by Simon St.Laurent
XML/XML Data Binding Resources
XML/XML-TiePYX in Perl modules
XML/XML.com XML From the Inside Out -- XML development, XML resources, XML specifications
XML/XMLDB Initiative Enterprise Technologies for XML Databases
XML/XMLSOFTWARE The XML Software Site
XML/XMLhack by and for the XML community
XML/XMLperl First stop on the XML-Perl highway
XML/XPathScript An Alternative To XSLT