The W3C is a curious beast. While moral in the sense that it evangelizes open standards and accessibility, it is strangely amoral in its obedience to member vendor whim and its insistence on the one true path of W3C-blessed technologies. Both of the topics I tackle this week are results in a way of that latter tendency. In this column I revisit the validation thread on XML-DEV, shaking the "one true way" assumptions on schema usage, and take a swipe at the latest SOAP specifications.
Fallacies of Validation
I am exceedingly grateful to Roger Costello who has taken the threads of discussion about validation covered last week and summarized them on XML-DEV under a series of posts entitled "Fallacies of Validation." Never one to look a gift horse in the mouth, I'll focus on Costello's summary.
In his most recent revision, Costello identifies seven fallacies.
Fallacy of "the Schema"
Drawing from comments from Michael Kay and Len Bullard, Costello notes that a system needn't have just a single schema.
Both Michael and Len are stating that in a system there should be numerous schemas. This is a big mindshift for me. I admit being trapped into thinking that there should be a single schema.
Fallacy of Schema Locality
Costello says that this issue arises when schemas encode local customs, but end up being used in a global scenario. I'm not sure this is a fallacy, but it's certainly a trap.
Fallacy of Requisite Validation
This might equally be called the fallacy of overaggressive validation. There are situations where validation doesn't win the system much and ends up inconveniencing others. Quoting Michael Kay talking about postal address validation:
The strategy (validating the user's address) assumes that you know better than your customers what constitutes a valid address. Let's face it, you don't, and you never will. A much better strategy is to let them (the user) express their address in their own terms.
Fallacy of Validation as a Pass/Fail Operation
This is a particularly intriguing point, as it flies in the face of traditional DTD and schema validation. Costello reproduces the argument here from Mary Holstege:
[Many people think that validation is a pass/fail operation.] Not so, although lots of people are still stuck in that way of thinking, including, alas, a lot of the vendors. The schema design goes to great pains to make it possible to do things like this, for example: validate a document against a tight schema, and then ask questions of the result such as "show me all the item counts that failed validation because they were too high."
So this is a failure of vendor implementation, not W3C XML Schema? Rick Jelliffe says it's more a systematic problem with grammar-based approaches to validation. His Schematron technology offers a rule-based approach which doesn't suffer from the same issue.
Fallacy of a Universal Validation Language
Regular readers of XML.com will perhaps already grasp this. Costello cites Dave Pawson's observation that the Atom format can't be validated against a single schema -- not even with Relax NG!
Whether this is a wise decision on Atom's part is another matter.
Fallacy of Closed System Validation
The XML schema version of Murphy's Law. This one is summed up nicely in Costello's words:
Many people imagine that they can create a monolithic, invariant schema because "there's just me and my well-known trading partners." This statement fails to recognize the existence of a changing world; more precisely, a changing ecosystem.
Fallacy that Validation is Exclusively for Constraint Checking
This one is somewhat related to fallacy #4. Costello writes:
I suspect that many people have the same mentality that I have regarding validation: "An XML instance document arrives, I forward it to a validator tool, if the validator tool doesn't complain then forward the instance to some software to process it. If there's an error then discard the instance."
He goes on to observe that validation can play other roles in a system, such as a starting point for messages and events that cause subsequent system action, rather than merely errors.
Another theme Costello picks out is using schema validation results to mediate evolution in a system. Validation messages can be used to adapt the behavior of a system, rather than blindly assuming the user is wrong.
A really useful summary. While some of the "fallacies" are perhaps better framed as anti-patterns, I think everyone who uses a schema with XML should read them in order to broaden their thinking about validation and its uses.
Chiefly amusing among the three specifications marching toward Candidate Recommendation status at the W3C is the Resource Representation SOAP Header Block. Before even looking at the specification, something smells a little strange about this. Chapter 2, verse 21 of the Book of REST teaches us that a resource representation is what you get when you retrieve content by resolving a URI. In everyday life, a resource representation is what you get in response to an HTTP request.
It is then with an element of fear I look to see what this specification can possibly contain. The introduction is plain enough:
This document describes the semantics and serialization of a SOAP header block for carrying resource representations in SOAP messages.
I am afraid this does not calm my nerves much. Are we then headed toward the re-encapsulation of HTTP in, erm, SOAP? The example at the end of section 4.1 seems to confirm this. The example includes a PNG image, base-64 encoded, in the header, and an XML body that refers to that image by URI. The example itself also gets amusingly close to being RDF.
This specification seems to be the final resolution of SOAP with Attachments. The circle is now complete! I now confidently predict that it will not be long before we find entire SOAP messages themselves encoded in base 64 and carried around in the headers of other SOAP messages. To further my amusement, I offer this challenge: If one can be found in the wild before XML 2004 this year, I will buy the finder a drink at the conference bar.
I will spare you further expostulation about what's wrong with HTTP anyway for such things, and the lunacy of using XML as a general container for any media type; and, actually, what about BEEP, which solved this problem years ago?
Instead, a little ditty adequately sums up my feelings. With apologies to De Morgan:
SOAP specs have little specs upon their backs to spite 'em
And little specs have lesser specs, and so ad infinitum.
Births, Deaths, and Marriages
The latest announcements from the XML-DEV mailing list.
- XULNews.com Live
Source for news on the Mozilla XUL user interface technology. Web interface to the xul-announce mailing list.
- Orbeon.org: Open Source XML-Based Integration
If you look behind the buzzwords, the exciting thing here is an open source XForms implementation.
- Topologi Professional Edition 2.1
Rick Jelliffe's editor for the markup professional. Hard to beat.
- XForms link directory
Useful-looking bunch of XForms links.
- SSYN: Alternative to Data-centric XML and YAML
The latest chapter in the attempt by Python programmers to remodel XML in their own image.
- XMLmind XML Editor V2.7
XML editor, kitchen sink, and lifestyle assistant adds support for many image formats, including SVG, command-line access to editor functions, and plugin enhancements.
Also in XML-Deviant
You don't get much more nostalgic than this ... proving XML-DEV is educational in every way, I learned my new word of the week: bloviate ... 126 messages to XML-DEV last week, Len rating 25% (happily the blog isn't getting all the attention) ... adherence to schemas causes trouble, from Venezuela to an English coal shed.
Schema as editing guide
I like this article, because it shows some practicality (?). Let me add another side : editorial document point of vue.
I am a XML document professional, and the main use I have always seen on DTDs was editing guideline (and constraint). This is a way to ensure the document follow a certain model, therefore providing all the information you will need to use or publish this document afterward.
Now I know Schema were created more for Data exchange than document encoding. But I believe this is still a point to be looked upon : The schema is, for me, a way to guide a generic tool (editor, code generator, interface generator) for a suitable interaction with the user.
So more than validation (which I frankly don't care about that much : better produce a correct instance than refuse a incorrect one afterward), Schema should be seen as guidelines.
As for SOAP, I'd like someone to point out for me in which functionnal perspective it is better than REST-style interface...