A Few Bumps

August 9, 2000

While Leigh Dodds takes a well-earned vacation, I find myself looking through the XML-Deviant's beady eye at the world of XML and XML developers. This week I'm reporting on a few bumps in XML's otherwise streamlined route to world domination: one of them the consequence of success, one of progress, and another a bogeyman lurking under XML implementors' beds.

The Expat Problem

Praise for James Clark's work in creating expat is universal, and has featured in these pages on more than one occasion previously. The very success of this C-based non-validating XML parser is now causing difficulties, however. The problem has arisen due to the integration of expat into many open source products, but by inclusion of the source code as opposed to use of shared libraries. On XML-DEV, Matt Sergeant outlined the difficulty:

What has happened is that many people have compiled expat statically into their application or library. The application I'm having most problems with is Apache. Apache compiles in expat as a default. It compiles it in statically (doesn't use a shared libxmltok/libxmlparse). Then you add PHP to the mixture. That contains its own version of expat when you enable the XML features ... Then you mix in a little mod_perl and introduce XML::Parser - that's another statically linked expat. And now I've got XML::Sablotron, now that does the right thing - gives you a dynamically loaded expat. But of course expat is already loaded, so there's another conflict...

The result is a segfault.

And that's not the only problem. Without an active central place of maintenance, "altered expats" are causing difficulties, and there's no central strand of development to support the needs of people integrating expat.

Not a wanton complainer by nature, Sergeant went on to propose a solution:

Do the same as 4XT.org has done. Register the domain name 4expat.org and maintain a version of expat that has all the features that XML::Parser, Sablotron, Apache and PHP (and others) need. Distribute a single .tar.gz of Expat that builds a shared library only. Distribute rpm's for Linux, and maybe a Windows installer if people really want that. Maintain a mailing list. Communicate with all the major library maintainers to make sure that we can all use a single shared expat library (DSO).

In other words, a group of expat integrators should become its new maintainers. This ambition seems to conflict with Clark's Thai Open Source Software initiative, which according to its home page will "assume distribution and maintenance of all James Clark's existing software". However the need certainly seems pressing, and ought not to rest unattended. Sergeant concludes:

This is a real and valid problem, that I'd love to see sorted out in a community way. And I'm assuming that James Clark would like to see his child go off into the big wide world now, a-la XT. If I'm wrong on that, James, I apologise, but the problem needs sorting somehow.

DTD & Namespace Anguish

The introduction of Namespaces in XML still has longstanding repercussions, and it looks like it'll be that way for a while yet to come. One significant point of disagreement is: do namespaces work with DTDs?

Yes, definitely.

No, not at all.

Paul Abrahams expands:

What's striking is that the proponents of the opposing views (namespaces work with DTD's; they don't) treat the answer as obvious and hardly worth discussion ... The problem I see is very simple. If I publish a DTD for general use with the intent that the names within it be declared by the importer as a namespace, then that namespace has a prefix. Since the DTD is intended to be universally applicable no matter what the prefix of the namespace describing it, it cannot,itself, use prefixed names: what prefix would it use?

He goes on to explain that such a scenario would make validation against a DTD with prefixes impossible. Noting that there is an "awkward kludge" using entity references, Abrahams called for an amendment to a merged XML 1.0 and Namespaces specification:

It would be possible to solve this problem by extending the syntax of EntityDecl to allow the specification of an implicit prefix for an included DTD, but that would require integration of the Namespace spec into the XML spec. Personally I believe that would be a Good Thing, but it doesn't appear to be in the cards

From there, the discussion moved on to a consideration of what XML validity actually means in the context of documents mixing multiple vocabularies by use of namespaces. James Robertson asks:

Isn't the issue that namespaces allow you to mix information from a number of sources, however you see fit? Every document can have different elements, and yet still be considered OK according to well-formed and namespace rules ...

How do we handle this behaviour, and still make some use of DTDs?

Norm Walsh observed that such mixing "violates the principals of validity at their core." Rick Jelliffe agreed, and elaborated:

That's a key point, I think. Roger Costello has argued very strongly that namespaced-vocabularies should have schema languages which are open by default rather than closed.

The utility of the idea of "content model" almost disappears with open, namespaced schemas. What is important is not that a follows b*, but that information items which have a strong semantic cohesiveness should not get decoupled.

I would go further, and suggest that content models are actually bad in XML (in SGML they serve a lexical purpose for which they are well-suited): this is because we have anonymous groups and various arrangements which impose constraints but whose significance is not available...

Jelliffe went on to sound the death knell for DTDs used in combination with namespaces. The great hope is now with other schema languages:

If you want schemas that are namespace aware get behind RELAX, or integrate Schematron, or review and test XML Schemas against some nice big schemas, or implement proper SGML architectural forms processors, or ...

Worst Case Scenario

Jelliffe's somewhat conservative language when speaking of XML Schemas, as opposed to RELAX or Schematron, sets the tone for the final bump in this week's column. In recent weeks, praise for XML Schemas has been hard to find, and important questions were laid very frankly by Bob Kline, trying to choose between the profileration of new schema technologies:

We have no shortage of proposals for approaches to the problem of specifying constraints on document content ... Unfortunately, the only one of these which has the backing of the W3C behind it -- XML Schema -- has a number of significant drawbacks. In spite of the frequent complaints (not difficult to understand) about the complexity of the draft specification, XML Schema does not address all of the basic requirements commonly needed in document validation.

He goes on to observe that XML Schema is in the "worst of both worlds", without tool support to provide a buffer against the complexity--yet, being so far advanced, not able to be simplified enough by the W3C Working Group for developers to write their own processors. Kline is stuck in a difficult situation, having to choose between a homespun schema language, gambling on "ephemeral" proposals, or just not implementing some validation functionality.

Kline lays these issues out at the end of his message, and phrases the most pertinent question in the whole XML Schemas affair:

How did Schema get so far, with a spec that's harder to read than any of the others, and still leave out functionality the absence of which is causing all of these competing specs to appear?

Neither answer nor rebuttal has yet appeared. We will be watching carefully to see if this question receives a response over the next week. If it doesn't, the silence will speak volumes.