June 20, 2001
SAX users will be interested in a little exchange (actually a disagreement) that took place on XML-DEV this week about how SAX might best be altered to support entity catalogs. Regular readers will remember that the issue of using catalogs -- a means to map SYSTEM and PUBLIC identifiers into physical locations -- was covered in an XML-Deviant article last year ("Filling in the Gaps", and see also "What's In a Name?" for additional discussion of entity management issues). This coincided with the release of some useful supporting classes from Norm Walsh, which allowed XML catalog support to be plugged into existing parsers.
The SGML Open/OASIS catalog was very useful to many people, and there are a fair number of experts who feel it was a mistake not to include something like that in XML...
The committee released its latest draft on the 12th June, prompting Rob Lugt to attempt an implementation. However Lugt quickly realized that SAX wasn't providing the information he needed, specifically both the base and relative URIs in their original forms. SAX currently only passes absolute URIs to the application, making it difficult to implement an alternate entity resolver. This lead Lugt to propose a SAX enhancement request to XML-DEV:
The XML Catalog draft specification describes several different entry types. Some entries are for resolving public identifiers, some are for resolving system identifiers. The system identifier entries are intended to be matched with the system identifier as it appears in the xml document being processed. Unfortunately, SAX 2.0 requires that system identifiers that are URIs are made absolute before calling the EntityResolver, thereby robbing the catalog processor of the opportunity to compare the system identifier with the catalog entries.
David Brownell strongly disagreed with the proposal, believing it to be outside of the functionality described in the XML specification.
Seems to me THAT is the problem, not SAX.
The XML spec is quite explicit on this topic: "relative URIs are relative to the location of the resource within which the entity declaration occurs" (4.2.2).
Those are the only contexts in which an XML parser needs to resolve URIs, and there's no weasel-wording that would allow what that catalog spec is intending to do. So I don't see why SAX should permit anything else, unless the XML spec gets a substantive functional change there ...
As it turned out, Brownell was apparently in a minority, with Norman Walsh, Paul Grosso and James Clark supporting the requirement, if not the specific proposal, from Lugt. Typically for any debate involving close interpretation of a specification, and the spectre of relative URIs, things got quite heated. Brownell believed both the Infoset and XML specifications supported the SAX design.
Well, since even the infoset does not include the information which you're proposing be exposed, I think you're putting the shoe on the wrong foot. I count two W3C specs (and SAX) that are consistent with the interpretation I've presented. You're the one proposing a change in basic XML infrastructure, so you (or maybe that TC) have the onus to resolve this.
Paul Grosso responded by noting that the committee, composed of many members of the XML Core Working Group, believed it was operating well within specified behavior:
About half of the OASIS Entity Resolution Technical Committee (ERTC) are active members of the W3C XML Core Working Group, the group with the responsibility for maintaining and interpreting the XML 1.0 Recommendation as well as developing the Infoset spec.
Whereas reasonable people may disagree on the interpretation of something in almost any written work, David's viewpoint of what is in compliance with the XML Recommendation is not shared by a fair number of the W3C XML Core Working Group members. As far as the Infoset, the XML Core WG also wrote that specification (the original editor of that spec being one of the key members of the OASIS Entity Resolution Technical Committee), and most of us don't believe that anything therein is contrary to the positions taken in the XML Catalog draft.
Also in XML-Deviant
The outcome is likely to be that the catalog specification will continue to rely on the availability of both base and relative URIs, with a subsequent knock-on effect on the SAX API.
Luckily this need not involve a significant API change. Lugt's proposal, revised to incorporate feedback from the list, suggests adding some additional standard properties and a new feature to SAX to support the passing of this information to interested applications.
By way of an update on the overall state of SAX itself, David Megginson made the following comments to the list, thanking those involved for their efforts:
Thanks to everyone for including me in the discussion on a new property for SAX2. I apologize that it has taken me so long to get out a bug-fix release; to try to speed things up (and to cut down on the number of duplicate bug reports), I've applied for a SourceForge SAX account; once that's set up, we can collect all the bug reports and feature requests there, and they'll be visible to the public at large rather than being hidden in a file on my hard drive. I don't know if the CVS support will be much help, since SAX2 is meant to change *very* infrequently (since it's low-level infrastructure and a lot depends on its stability), but it will be there all the same.
Megginson also noted that he would be following a similar approach with his XML Writer and RDF Filter applications.
This debate is instructive, not only because it signals a change to everyone's favorite home-brewed XML API, but also because it shows that despite considerable scrutiny over the last three years the XML specification still has a few areas open to incompatible interpretation. No doubt there are some surprises awaiting us in some of the other related specifications, especially when relative URIs are thrown into the mix.