Name

Table of contents

Name
Overview
XML::XPath
XML::Parser
XML::SAX
Resources

Name

XML Ramble

Overview

Perl, as always, gives you various ways to do things. Processing XML is no exception.

Here I offer a few comments on XML::XPath, XML::Parser (and so XML::DOM), and XML::SAX.

XML::XPath

Perl has a number of mini-languages in it. Take:

        $x = './x +';

From there we could have:

        $a = q|$x|;
        $b = qq|$x|;
        $c = qr|$x|;
        $d = qx|$x|;

Then there are XS, POD, the full monty of regular expressions, even here documents, etc. So it comes as no real surprise when the XML::XPath docs say:

        "nodeset = find($path, [$context])

The find function takes an XPath expression (a string) and returns either an XML::XPath::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of XML::XPath::Literal (a string), XML::XPath::Number, or XML::XPath::Boolean."

Note the phrase 'XPath expression'. This is the XPath mini-language I referred to in my original posting. The return value could be construed as belonging to yet another mini-language.

As I see it, there are 2 connotations:

a) You need to know the exact structure of the XML document being parsed in order to craft meaningful XPath expressions. Of course, you normally do know the structure, and most likely you need to know it no matter which module you use. Nevertheless, this affects the generality of the code you write.

b) Your code will be procedural in structure, and you'll basically process the XML document with 'for' statements. Eg (again from the XML::XPath docs):

        "my $nodeset = $xp->find('/html/body/p'); # find all paragraphs

        foreach my $node ($nodeset->get_nodelist) {
                print "FOUND\n\n",
                        XML::XPath::XMLParser::as_string($node),
                        "\n\n";
        }"

There's nothing wrong with this method of analyzing an XML document. If it does the job - fine. I think of this method as a bit like plodding, or as a line-by-line approach (what I called 'linear' in my original posting).

For an XML document with any sort of structure, the 'for' loops are going to become more and more nested, and it's this aspect of the XPath approach which I feel uneasy about.

Also, it's the same old sort of coding I write every day. I'm learning XPath expressions sure, but I'm learning nothing new about Perl, or ways of thinking. This contrasts quite strongly with the call-back structure of code using XML::Parser and XML::SAX. Some people would say this is a good thing, in that the module (XML::XPath) is shielding me from certain complexities of the problem at hand. Hmmm.

Lastly, there is neither need nor expectation of using OO code here.

XML::Parser

The module most of us cut our teeth on, probably, either directly or via XML::DOM.

This module is call-back (handler) based, and hence you have to think quite differently about the processing logic as compared to using XML::XPath.

Actually, I think being comfortable with call-backs is an important skill, and modules like this are a good, and well-documented, way of developing this skill.

This particular module is stable and famous. And, if it does the job - fine.

I used XML::Parser when I decided to convert my web site from HTML files to a single XML file, and needed a way of reconstructing the HTML from the XML. In going thru this I learned a lot. I became more comfortable with call-backs, and am very glad I went through this particular learning process.

But I was never happy with the structure of the code, although very pleased with it as a first attempt (ie ambivalent). I started programming as a uni student in 1970 (gasp), and I could feel in my bones that this code just wasn't quite right.

Again, there is neither need nor expectation of using OO code here.

However time marches on, and so does software technology. Enter SAX.

XML::SAX

With this module, the call-backs are in their own module, and the latter is based on XML::SAX::Base.

I've learned a lot in the last 3 years, of course, but even so, using SAX has been a revelation. It's a vastly more natural way of processing XML.

My web site XML file was 4,017 lines, almost all double spaced, and the original Perl code was 335 lines. Obviously no big deal.

That program also contained the HTML which had a CSS embedded in it. In the new design, the CSS is in its own file.

I've now simplified the XML file slightly, so it's down to 3,678 lines. The SAX-based Perl is in 2 parts:

Not much of a code saving, if that was the intention. But then, it was not my intention.

Resources

Articles/OpenOffice/XML.com Adventures with OpenOffice and XML [Feb. 07, 2001]

Articles/Perl/XML/How to

Articles/Perl/XML/Igor's Webhome

Articles/Perl/XML/Perl SAX 2.0 Binding

Articles/Perl/XML/Simple XML Validation with Perl

Articles/Perl/XML/TPC 2001 Conference Presentations

Articles/Perl/XML/XML Encodings

Articles/Perl/XML/XML Encodings - index

Articles/Perl/XML/XML Modules

Articles/Perl/XML/XML Modules - index

Articles/Perl/XML/XML and Perl

Articles/Perl/XML/XML-RPC/web.oreilly.com -- Binary Data to Go Using XML-RPC to Serve Up Charts on the Fly

Articles/Perl/XML/XML.com Creating Scalable Vector Graphics with Perl [Jul. 11, 2001]

Articles/Perl/XML/XML.com Creating Web Utilities Using XMLXPath

Articles/Perl/XML/XML.com High-Performance XML Parsing With SAX [Feb. 14, 2001]

Articles/Perl/XML/XML.com Perl XML Quickstart Convenience Modules [Jun. 13, 2001]

Articles/Perl/XML/XML.com Perl XML Quickstart The Perl XML Interfaces [Apr. 18, 2001]

Articles/Perl/XML/XML.com Perl XML Quickstart The Standard XML Interfaces [May. 16, 2001]

Articles/Perl/XML/XML.com Pyxie [Mar. 15, 2000]

Articles/Perl/XML/XML.com Transforming XML With SAX Filters [Oct. 10, 2001]

Articles/Perl/XML/XML.com Using XSL Formatting Objects

Articles/Perl/XML/XML.com Writing SAX Drivers for Non-XML Data [Sep. 19, 2001]

Articles/Perl/XML/XML.com XMLLibXML - An XMLParser Alternative [Nov. 14, 2001]

Articles/Perl/XML/XML.com XMLParser and Character Encodings

Articles/Perl/XML/XML::Twig/Tutorial

Articles/Perl/XML/XML::Twig/Tutorial - examples

Articles/Perl/XML/XML::Twig/Using XMLTwig [Mar. 21, 2001]

Articles/Perl/XML/XML::Twig/Ways to Rome

Articles/Perl/XML/XML::Twig/Ways to Rome - index

Articles/Perl/XML developer news from XMLhack by and for the XML community

Articles/Perl/XSP/XSP & Apache

Articles/XML/XML etc tutorial The XML Revolution

Articles/XML/XMLperl X Marks (up) the Language

FAQs/XML/FAQ (W3C)

FAQs/XSL Frequently Asked Questions

Security/Windows 2000/New XML-Based Security Site; Hardening Windows 2000

XML/DMJG.DE's SkatDoc XML

XML/Enno's Home Page

XML/IBM - Visual XML Tools

XML/LibXML

XML/MSXML Resources

XML/Mike J. Brown's XML and XSL stuff

XML/Orchard Source and Documentation

XML/Perl & XML & HTML - IBM

XML/Perl modules

XML/Pyxie Home Page

XML/RXP - Validating open source parser in C

XML/Sablotron/Ginger Alliance

XML/Take23 news and resources for the mod_perl world

XML/The XML C library for Gnome

XML/The XML Companion

XML/The XML Cover Pages - Home Page

XML/XER (XML Encoding Rules)

XML/XML - Books and essays by Simon St.Laurent

XML/XML Bible

XML/XML Cooktop

XML/XML Data Binding Resources

XML/XML Database Products

XML/XML Protocol Comparisons

XML/XML Query Engine

XML/XML Recommendation

XML/XML Schema tutorial

XML/XML School

XML/XML School (2)

XML/XML tools by category

XML/XML-Edifact.org

XML/XML-RPC Home Page

XML/XML-TiePYX in Perl modules

XML/XML.com

XML/XML.com - Getting started

XML/XML.com XML From the Inside Out -- XML development, XML resources, XML specifications

XML/XML.org -- Industry News

XML/XMLDB Initiative Enterprise Technologies for XML Databases

XML/XMLSOFTWARE The XML Software Site

XML/XMLhack by and for the XML community

XML/XMLperl First stop on the XML-Perl highway

XML/XPathScript An Alternative To XSLT

XML/XSA XML Software Autoupdate

XML/XSL Transformations XSLT Alleviates XML Schema Incompatibility Headaches -- MSDN Magazine, August 2000

XML/Xerces/Xerces PPM for Windows