Xml ramble

Table of Contents

Name
Overview
XML::XPath
XML::Parser
XML::SAX
Resources
Author
Licence

Xml ramble

Name

XML Ramble

Overview

Perl, as always, gives you various ways to do things. Processing XML is no exception.

Here I offer a few comments on XML::XPath, XML::Parser (and so XML::DOM), and XML::SAX.

XML::XPath

Perl has a number of mini-languages in it. Take:

	$x = './x +';

From there we could have:

	$a = q|$x|;
	$b = qq|$x|;
	$c = qr|$x|;
	$d = qx|$x|;

Then there are XS, POD, the full monty of regular expressions, even here documents, etc. So it comes as no real surprise when the XML::XPath docs say:

	"nodeset = find($path, [$context])

The find function takes an XPath expression (a string) and returns either an XML::XPath::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of XML::XPath::Literal (a string), XML::XPath::Number, or XML::XPath::Boolean."

Note the phrase 'XPath expression'. This is the XPath mini-language I referred to in my original posting. The return value could be construed as belonging to yet another mini-language.

As I see it, there are 2 connotations:

a) You need to know the exact structure of the XML document being parsed in order to craft meaningful XPath expressions. Of course, you normally do know the structure, and most likely you need to know it no matter which module you use. Nevertheless, this affects the generality of the code you write.

b) Your code will be procedural in structure, and you'll basically process the XML document with 'for' statements. Eg (again from the XML::XPath docs):

	"my $nodeset = $xp->find('/html/body/p'); # find all paragraphs
	foreach my $node ($nodeset->get_nodelist) {
		print "FOUND\n\n",
			XML::XPath::XMLParser::as_string($node),
			"\n\n";
	}"

There's nothing wrong with this method of analyzing an XML document. If it does the job - fine. I think of this method as a bit like plodding, or as a line-by-line approach (what I called 'linear' in my original posting).

For an XML document with any sort of structure, the 'for' loops are going to become more and more nested, and it's this aspect of the XPath approach which I feel uneasy about.

Also, it's the same old sort of coding I write every day. I'm learning XPath expressions sure, but I'm learning nothing new about Perl, or ways of thinking. This contrasts quite strongly with the call-back structure of code using XML::Parser and XML::SAX. Some people would say this is a good thing, in that the module (XML::XPath) is shielding me from certain complexities of the problem at hand. Hmmm.

Lastly, there is neither need nor expectation of using OO code here.

XML::Parser

The module most of us cut our teeth on, probably, either directly or via XML::DOM.

This module is call-back (handler) based, and hence you have to think quite differently about the processing logic as compared to using XML::XPath.

Actually, I think being comfortable with call-backs is an important skill, and modules like this are a good, and well-documented, way of developing this skill.

This particular module is stable and famous. And, if it does the job - fine.

I used XML::Parser when I decided to convert my web site from HTML files to a single XML file, and needed a way of reconstructing the HTML from the XML. In going thru this I learned a lot. I became more comfortable with call-backs, and am very glad I went through this particular learning process.

But I was never happy with the structure of the code, although very pleased with it as a first attempt (ie ambivalent). I started programming as a uni student in 1970 (gasp), and I could feel in my bones that this code just wasn't quite right.

Again, there is neither need nor expectation of using OO code here.

However time marches on, and so does software technology. Enter SAX.

XML::SAX

With this module, the call-backs are in their own module, and the latter is based on XML::SAX::Base.

I've learned a lot in the last 3 years, of course, but even so, using SAX has been a revelation. It's a vastly more natural way of processing XML.

My web site XML file was 4,017 lines, almost all double spaced, and the original Perl code was 335 lines. Obviously no big deal.

That program also contained the HTML which had a CSS embedded in it. In the new design, the CSS is in its own file.

I've now simplified the XML file slightly, so it's down to 3,678 lines. The SAX-based Perl is in 2 parts:

  • A module of 110 lines

    One instance of this handler object exists during the parse, and it accumulates all the data required to crank out HTML at the end of the parse.

    I think this is an exceptionally neat way of parsing XML, and am delighted with the result.

    Let me emphasise, I am not an OO fanatic. POP (Plain Old Perl) I usually fine by me. This project just happens to be one where the OO approach effortlessly leads to a superior design. And by superior, I mean clearer, neater, shorter.

    This object contains no reference whatsoever to the CSS which will be used to format the HTML.

  • A driver of 140 lines (and not quite finished)

    Almost all of this code is taken up with generating HTML tables out of the data accumulated in the handler object during the parse. It's here too that the CSS is invoked.

    I haven't bothered to write methods which return bits and pieces of the data as required by the main program. I just use the handler object as a hash ref and so access the data directly. Ok, so the OO purists will be going tut-tut. Who cares?

    Thus I create the HTML first, and can then tweak the CSS and review the result to my heart's content.

Not much of a code saving, if that was the intention. But then, it was not my intention.

Resources

Articles/OpenOffice/XML.com Adventures with OpenOffice and XML [Feb. 07, 2001] http://www.xml.com/pub/a/2001/02/07/openoffice.html

Articles/Perl/XML/How to http://209.52.133.234/x3d/howto.html

Articles/Perl/XML/Igor's Webhome http://www.fh-frankfurt.de/~igor/projects/libxml/

Articles/Perl/XML/Perl SAX 2.0 Binding http://kmacleod.static.iaxs.net/~ken/perl-xml/sax-2.0.html

Articles/Perl/XML/Simple XML Validation with Perl http://www.xml.com/pub/a/2000/11/08/perl/index.html

Articles/Perl/XML/TPC 2001 Conference Presentations http://axkit.org/docs/presentations/tpc2001/

Articles/Perl/XML/XML Encodings http://standards.ieee.org/resources/spasystem/twig/encoding/encoding.html

Articles/Perl/XML/XML Encodings - index http://standards.ieee.org/resources/spasystem/twig/encoding/

Articles/Perl/XML/XML Modules http://standards.ieee.org/resources/spasystem/twig/perl_xml/perl_xml.html

Articles/Perl/XML/XML Modules - index http://standards.ieee.org/resources/spasystem/twig/perl_xml/

Articles/Perl/XML/XML and Perl http://www.oasis-open.org/cover/xmlAndPerl.html

Articles/Perl/XML/XML-RPC/web.oreilly.com -- Binary Data to Go Using XML-RPC to Serve Up Charts on the Fly http://web.oreilly.com/news/xmlrpc_0701.html

Articles/Perl/XML/XML.com Creating Scalable Vector Graphics with Perl [Jul. 11, 2001] http://www.xml.com/pub/a/2001/07/11/creatingsvg.html

Articles/Perl/XML/XML.com Creating Web Utilities Using XMLXPath http://www.xml.com/pub/a/2000/01/10/perlwebtools.html

Articles/Perl/XML/XML.com High-Performance XML Parsing With SAX [Feb. 14, 2001] http://www.xml.com/pub/a/2001/02/14/perlsax.html

Articles/Perl/XML/XML.com Perl XML Quickstart Convenience Modules [Jun. 13, 2001] http://www.xml.com/pub/a/2001/06/13/perlxml.html

Articles/Perl/XML/XML.com Perl XML Quickstart The Perl XML Interfaces [Apr. 18, 2001] http://www.xml.com/pub/a/2001/04/18/perlxmlqstart1.html

Articles/Perl/XML/XML.com Perl XML Quickstart The Standard XML Interfaces [May. 16, 2001] http://www.xml.com/pub/a/2001/05/16/perlxml.html

Articles/Perl/XML/XML.com Pyxie [Mar. 15, 2000] http://www.xml.com/pub/a/2000/03/15/feature/index.html

Articles/Perl/XML/XML.com Transforming XML With SAX Filters [Oct. 10, 2001] http://www.xml.com/pub/a/2001/10/10/sax-filters.html

Articles/Perl/XML/XML.com Using XSL Formatting Objects http://www.xml.com/pub/a/2001/01/17/xsl-fo/index.html

Articles/Perl/XML/XML.com Writing SAX Drivers for Non-XML Data [Sep. 19, 2001] http://www.xml.com/pub/a/2001/09/19/sax-non-xml-data.html

Articles/Perl/XML/XML.com XMLLibXML - An XMLParser Alternative [Nov. 14, 2001] http://www.xml.com/pub/a/2001/11/14/xml-libxml.html

Articles/Perl/XML/XML.com XMLParser and Character Encodings http://www.xml.com/lpt/a/2000/04/26/encodings/xmlparser.html

Articles/Perl/XML/XML::Twig/Tutorial http://standards.ieee.org/resources/spasystem/twig/

Articles/Perl/XML/XML::Twig/Tutorial - examples http://standards.ieee.org/resources/spasystem/twig/tutorial/

Articles/Perl/XML/XML::Twig/Using XMLTwig [Mar. 21, 2001] http://www.xml.com/pub/a/2001/03/21/xmltwig.html

Articles/Perl/XML/XML::Twig/Ways to Rome http://standards.ieee.org/resources/spasystem/twig/perl_survey/perl_xml_survey.html

Articles/Perl/XML/XML::Twig/Ways to Rome - index http://standards.ieee.org/resources/spasystem/twig/perl_survey/

Articles/Perl/XML developer news from XMLhack by and for the XML community http://xmlhack.com/list.php?cat=10

Articles/Perl/XSP/XSP & Apache http://xml.apache.org/cocoon/xsp.html

Articles/XML/XML etc tutorial The XML Revolution http://www.brics.dk/~amoeller/XML/

Articles/XML/XMLperl X Marks (up) the Language http://xmlperl.com/articles/ebohlman/xmarkslang3.php

FAQs/XML/FAQ (W3C) http://www.ucc.ie/xml/

FAQs/XSL Frequently Asked Questions http://www.dpawson.freeserve.co.uk/xsl/xslfaq.html

Security/Windows 2000/New XML-Based Security Site; Hardening Windows 2000 http://www.shavlik.com/pr_xml_mssecure.asp

XML/DMJG.DE's SkatDoc XML http://www.dmjg.de/skatdoc/

XML/Enno's Home Page http://users.erols.com/enno/

XML/IBM - Visual XML Tools http://alphaworks.ibm.com/tech/visualxmltools

XML/LibXML http://xmlsoft.org/

XML/MSDN Online XML Developer Center http://www.msdn.microsoft.com/xml/default.asp

XML/Mike J. Brown's XML and XSL stuff http://www.skew.org/xml/

XML/Orchard Source and Documentation http://casbah.org/~kmacleod/orchard/

XML/Perl & XML & HTML - IBM http://www-4.ibm.com/software/developer/library/xml-perl/

XML/Perl modules http://www.xml.com/pub/2000/04/05/feature/index.html

XML/Pyxie Home Page http://www.digitome.com/pyxie.html

XML/RXP - Validating open source parser in C http://www.cogsci.ed.ac.uk/~richard/rxp.html

XML/Sablotron/Ginger Alliance http://www.gingerall.com/

XML/Take23 news and resources for the mod_perl world http://take23.org/

XML/The XML C library for Gnome http://www.xmlsoft.org/

XML/The XML Companion http://www.bradley.co.uk/xmlbook.htm

XML/The XML Cover Pages - Home Page http://www.oasis-open.org/cover/sgml-xml.html

XML/XER (XML Encoding Rules) http://asf.gils.net/xer/

XML/XML - Books and essays by Simon St.Laurent http://www.simonstl.com/

XML/XML Bible http://metalab.unc.edu/xml/books/bible/

XML/XML Cooktop http://xmleverywhere.com/cooktop/

XML/XML Data Binding Resources http://www.rpbourret.com/xml/XMLDataBinding.htm

XML/XML Database Products http://www.rpbourret.com/xml/XMLDatabaseProds.htm

XML/XML Protocol Comparisons http://www.w3.org/2000/03/29-XML-protocol-matrix

XML/XML Query Engine http://www.fatdog.com/#DOWNLOAD

XML/XML Recommendation http://www.w3.org/TR/REC-xml

XML/XML Schema tutorial http://www.xfront.com/

XML/XML School http://www.refsnesdata.no/xml/default.asp

XML/XML School (2) http://www.w3schools.com/xml/default.asp

XML/XML tools by category http://www.garshol.priv.no/download/xmltools/cat_ix.html

XML/XML-Edifact.org http://www.xml-edifact.org/pub/

XML/XML-RPC Home Page http://xmlrpc.org/

XML/XML-TiePYX in Perl modules http://www.omsdev.com/ebohlman/perlmodules/

XML/XML.com http://www.xml.com/pub

XML/XML.com - Getting started http://www.xml.com/pub/norm/part1/getstart1.html

XML/XML.com XML From the Inside Out -- XML development, XML resources, XML specifications http://www.xml.com/

XML/XML.org -- Industry News http://www.xml.org/xml/news_market.shtml

XML/XMLDB Initiative Enterprise Technologies for XML Databases http://www.xmldb.org/

XML/XMLSOFTWARE The XML Software Site http://www.xmlsoftware.com/

XML/XMLhack by and for the XML community http://www.xmlhack.com/

XML/XMLperl First stop on the XML-Perl highway http://xmlperl.com/

XML/XPathScript An Alternative To XSLT http://www.xml.com/pub/2000/07/05/xpathscript/index.html

XML/XSA XML Software Autoupdate http://www.garshol.priv.no/download/xsa/

XML/XSL Transformations XSLT Alleviates XML Schema Incompatibility Headaches -- MSDN Magazine, August 2000 http://msdn.microsoft.com/msdnmag/issues/0800/XSLT/XSLT.asp

XML/Xerces/Xerces PPM for Windows http://www.xmlproj.com/xerces/windows_install.html

Author

Ron Savage.

Home page: http://savage.net.au/index.html

This POD was converted to HTML by /Perl.html#fancy-pom2.pl

  • Version: 1.01 01-Jun-2006

    This version disguises my email address.

  • Version: 1.00 25-Nov-2001

    Original version.

Licence

Australian Copyright © 2002 Ron Savage. All rights reserved.

	All Programs of mine are 'OSI Certified Open Source Software';
	you can redistribute them and/or modify them under the terms of
	The Artistic License, a copy of which is available at:
	http://www.opensource.org/licenses/index.html
 
Top of page