A Path to Enlightenment
August 29, 2001
XML-DEV has been busy recently with a number of long running threads drawing in some interesting postings. The underlying theme has been general orientation on how best to understand and come to terms with particular technologies, ranging from Schemas to Web Services. This week the XML-Deviant walks the XML-DEV path to enlightenment to see where it leads.
Web Services Debunked
The first steps on the path are familiar territory: deflating unnecessary hype. This discussion began in response to Edd Dumbill's recent Taglines column. Of particular interest was the accusation that marketing fanfare is overhyping "web services".
The response was mixed; while most agreed that the hype was selling more than could be reasonably delivered, others remained adamant that technologies such as SOAP brought real value, Michael Brennan among them.
We've been doing integrations across the web using XML messaging for several years now -- integrating with CRM systems, order entry systems, synchronizing user profile info with directory services, providing single sign-on solutions that integrate portals with hosted solutions across the web. It's been working just fine. When SOAP came along, we aligned our approach to SOAP. It's still working just fine. And last year we extended our approach to include SOAP-based integrations with desktop productivity tools -- MS Outlook and Excel -- allowing users to leverage our service from non-browser tools, and to be able to synchronize data with applications employed for offline use.
Those who don't see the proof that this works are simply not looking.
Brennan demonstrates what seems to be a common perception: web services are simply a formulation, albeit using a new set of technologies, of what many having been doing for years. Michael Champion expounded this view, suggesting that SOAP over HTTP is simply an alternative to URLs from Hell.
I personally (obligatory disclaimer ...) suspect that SOAP over HTTP will find its niche mainly as a cleaner, more standardized way of doing what people have been doing with HTTP parameters and CGI scripts "forever". I've sweated over the production and parsing of enough URLs from Hell that I grok the SOAP / UDDI / WSDL vision of doing this in a more orderly manner. Whether that provides a solid foundation for Yet Another Paradigm is another matter entirely.
It hardly seems like a new paradigm for application development, does it? In a later posting, Champion also explored answers to the question, what are web services good for?
Another part of the discussion was some clarification of what SOAP actually is. At various times it has been compared to CORBA and similar distributed object systems; you can also find similar comparisons between XML-RPC and CORBA. However, the comparison isn't justified. Joshua Allen provided a clear appraisal of how SOAP fits into the distributed object framework.
You could probably consider SOAP and CORBA as complementary. SOAP to IIOP might be a better comparison. The three "big" object server models out there have been CORBA, EJB, and COM+ -- these three use IIOP, RMI, and DCOM respectively as the primary method to pass information to and from objects. Now that SOAP is on the scene, CORBA, EJB and COM+ don't go away, they just have another way to pass information to and from objects. In fact, before SOAP, there were many ways to get these three different worlds to interoperate -- the difference with SOAP is that the interop layer is based on XML, supposedly easier to implement than something like an RMI/DCOM bridge, and so on. For example, if I have some objects written in CORBA that provide some service, I no longer have to convince all of my customers to install an IIOP communication layer. With SOAP, the layer that calls my CORBA object could be as simple as a UNIX bash script that pipes some text through netcat. So I think of SOAP as being a universal IIOP/RMI/DCOM substitute that mere mortals can type by hand.
Henrik Frystyk Nielsen's explanation was much pithier and to the point.
I would just as a reminder like to point out that SOAP doesn't aspire to be a distributed object system. It is a nothing more than a wire protocol.
And, as Michael Brennan explained, the Web Services Description Language (WSDL) completes the picture by providing functionality that COM and EJB developers have long been using.
WSDL was motivated by concrete experience with early SOAP implementations. For those who tried to develop tools that supported creating client side interfaces that map to a specific service -- as any developer using CORBA, EJBs, or COM is accustomed to -- it was clear that something like this was needed.
After only a brief journey down our path, we've learned an interesting lesson: web services, as embodied by SOAP and WSDL, don't offer any functionality that developers haven't already had for some time. But SOAP-WSDL achieves things in a way that is potentially more open and cross-platform. While these are certainly laudable goals, it doesn't seem like there's as much to the web service revolution as many (would have us) believe.
Moving on, it seems that articles like Don Smith's Understanding W3C Schema Complex Types' are helping users come to terms with W3C XML Schemas; other comments on XML-DEV suggest that many are making progress on their migration away from DTDs. This seems particularly true for less ambitious uses, as Len Bullard related.
I think XML Schemas are too hard because we aren't really sure what they are supposed to do... For me they are easy because... most of what I want to represent can be done in DTDs. Still, I find myself creating restricted simple types for reuse to pick up the extra power of regexes and that is a step beyond DTDs.
I can't say that I totally understand model groups, substitution groups and all of the inheritance rules.
OTOH, I believe I can write simple schemas w/o ever using these features. I am a firm believer that simple tasks should be easy to do. Sometimes complex tasks are hard to do and there is no getting around it, so perhaps some of the complexities of XML Schema are necessary. As long as they don't get in the way of the simple tasks I'd be happy.
However as Joe English warned, no system is an island and others may be far more ambitious.
...how many of these features are you going to encounter as you interact with other data systems? That's the real problem with complex schema languages. No system is an island, and sooner or later you're going to have to understand all what you're being given, not just what you produced. Even with good tools, concepts can be hard to grasp, which is the point.
Slightly further down the road, we learned that W3C XML Schemas are about more than just validation. This is no great surprise, but it's useful to see it clearly spelled out. Interestingly Michael Brennan observed that this may be the cause of some of the frustration directed at the XML Schema specification.
I [...] think that issues about things Schemas cannot represent are probably a frustration to those trying to use it as a validation language. I know that we have been adapting some of our integration interfaces to use XML structures that can be adequately expressed with XML Schema. I think you have to question the value of a validation language when you find that you are redesigning your XML structures to accommodate the weaknesses of the schema language. In our case, this means changing our structures such that an element's content model is not identified by an attribute value. RELAX NG can accommodate this, but XSD cannot.
This is an intriguing observation as it implies the conclusion that, if all you need is a schema language slightly more sophisticated than DTDs, RELAX NG may be the appropriate choice. RELAX is solely about validation and is, therefore, likely to be a better fit for that particular use case. Further, if you need only simple datatyping then a mix of RELAX and Schematron may be enough. Rick Jelliffe demonstrated this week that Schematron can be used for typechecking.
Most other schema languages have built-in types. I guess that since people will tend to evaluate schema languages using a check-box, they might put "no datatyping" on Schematron, when really they mean no "built-in" data types (apart from the XPath ones: number, string, boolean).
A Departure from Type
Also in XML-Deviant
Arguments about types seem to fill the landscape this week, as runoff from the schema and namespace debate summarized in last week's XML-Deviant continues. (None too surprising, perhaps, given that those topics have probably consumed more XML-DEV threads than any others).
It seems that there are different viewpoints about where and how, or even whether, type information is associated with XML markup. The core issues relate to whether one is working with simple well-formed markup or with systems that involve validation and strong typing. Tim Bray has been at the center of this discussion, arguing at length that properly labeled markup is the key to maximum flexibility.
Q1: Why would you use XML?
A1: One of the important reasons is so that you can re-use data for purposes other than those envisioned by its creator. This is why, in the document space, XML is an unqualifiedly better storage format than MS Word, Frame, PDF, or any other proprietary binary display-oriented format. A lot of the XML-as-serialized-objects people probably don't care that much about this, but I think they're missing an important boat. Computers are important because they are *general-purpose* machines, and to the extent that you can make data general-purpose as well, you win.
Q2: Why would you use namespaces?
A2: One of the important reasons is so you can pull together data objects from multiple sources without losing track of where the pieces come from.
If you believe A1 and A2, then it seems to me like you get maximum re-usability and ability to mix-n-match if you've got everything unambiguously and completely labeled with minimal reliance on context.
Shirking mentions of "type" as the key to enlightenment, Bray also presented a worldview which he argued was consistent with all interpretations: XML + Namespaces is a means to associate labels with data structures. One is free to build any kind of architecture on this, including one that is strongly influenced by type information.
While this may show good architectural design by not limiting decisions in other layers, several people, including Paul Prescod, were concerned that the plurality of different architectures isn't being explored; rather, in fact typing is leading the way.
For better or for worse, the emerging XML architecture DOES elevate schemas, validation and ty[p]e declarations above other "XML processing applications". For example SOAP and WSDL implementations use XML schema types to do type conversions. WSDL actually uses XML Schema as some kind of abstract type definition system (completely distinct from its use as an XML validation tool). XSLT 2.0 and XPath 2.0 are also going to use information from the schema.
These specifications do not build on XML Schema for its validation facilities. They do for its t[ype] system. So flaws in that system will eventually become material to all XML users. Some future applications may not deal with element labels (or ulabels) at all. They will deal with t[ype] names.
Publication of the XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0 Working Draft clearly demonstrates Prescod's point.
These discussions hint at further splits in the community. Is there a potential fork on our road? Some of us are interested only in well-formed documents, while others want to mix in a constraints (validation) mechanism. Both groups seem to need much less than is currently being designed. And still others desire strong typing and object-oriented features; these are the ones who seem currently be most well-served by the latest W3C deliverables. But their satisfaction may be a detriment to those seeking a simpler existence.
Reverend Occam's Barber Shop
This wouldn't be the first time that a fork in the road has been highlighted. In fact in another thread this week the same suggestion, simplification through refactoring, has been made several times. Pertinent to the previous discussion, Alexander Nakhimovsky suggested refactoring W3C XML Schemas.
...it would be a good thing, IMO, if XML Schema were *re-factored* into the validating part (such as RELAX NG) and a "complex-type-relations" part, for use in specialized applications.
Following a discussion concerning the use of XPointer within XInclude, which showed that a streaming processing model is not possible without some kind of subsetting (ideally of XPath), Sean McGrath made a plea to the W3C to explore further subsetting of their output.
The W3C could stop for breath and find out what pieces of XML 1.0 the majority *really* use. Don't ask vendors - they are not a reliable source of information. Don't ask consultants, their business case is based on complexity. Don't ask theoreticians, they find it all easy (and even if they don't they will probably say they do as they have egos like the rest of us and they are paid to be really smart).
Instead, ask XML users. Zoom in on the uncommonly used bits that cause the most problems for the ancillary specifications. Work towards issuing new iterations of the core specifications that take things OUT rather than add stuff in. A bold, brave step that would differentiate W3C from all the previous tower-of-babel standards bodies.
Do it as an experiment. Do it as a controlled fork. If it does not yield benefits, scrap it.
Some argued that this isn't as easy as it might seem. Henry Thompson wondered how one identifies the users who matter:
There is a fundamental question buried within your request: who are the users whose voices matter, and how do we identify them? Remember Lot pleading for Sodom and Gomorrah? How many people really using feature F of XML 1.0 + Namespaces + ... does it take in order to render it safe from pruning?
Michael Champion agreed with Thompson, believing that a close shave at Reverend Occam's Barbershop, while overdue, would be a hard fought battle:
Like all zealots, I'm firmly convinced that "the people" are on our side <grin> and that very few of the paying customers would object if XML+namespaces+infoset+schemas got a close shave at Rev. Occam's Barbershop. On the other hand, I know that plenty of XML specialists would raise hell if their favorite feature was shaved off. So I have the same question "who are the users whose voices matter and how do we identify them?" I don't have any great suggestions.
Yet these aren't so much arguments against attempting the task as they are an acknowledgment that no subset is likely to please everyone. If the same viewpoint had won out several years ago, then we wouldn't have XML in front of us now. And this subset hasn't stopped useful work being done. Another point worth making is that many argue for refactoring rather than subsetting; the former involves retaining functionality while improving the architecture. So there are other means to the same end. Smaller, refactored specifications will also lend themselves well to being put together in different ways and with alternate piece in key positions. This seems a good way to achieve greater satisfaction.
So, at the end of this week's jaunt, we've come full circle. For much of the last year the same obstacles have been in our way. One wonders how long this will continue before new ground is struck?