XML on the Move

February 21, 2001

Edd Dumbill

On the first day of XML DevCon Europe in London, England, speakers highlighted the growth of XML in its three years of existence. Henry Thompson from the University of Edinburgh (and zealous editor of the W3C's XML Schema specification) noted in his opening keynote that XML had grown from one specification to a family of technologies. He focused on the emerging centrality of the XML Infoset and XML Schema. David Orchard of Jamcracker taught a session on web services, XML, and UDDI. Despite XML's growth in the area of program-to-program communication, there's still much to build.

Infoset Pipelining

In surveying the growth of XML specifications such as XPath and DOM, Henry Thompson observed that they each had to supply what XML 1.0 was missing: a data model. But these separate data models were different, hence the development of the XML Information Set specification, which specifies a data model for XML documents and is mean to provide a reference model for further XML specifications.

The Infoset is essentially a distillation of the vital parts of an XML document after it's been parsed. According to Thompson, the Infoset leaves out the "uninteresting" parts of a document, such as whether attribute values use double or single quotes, the amount of whitespace outside of elements, and whether empty elements are written with one tag or two. Of course, Thompson noted, XML editing applications needed to know these things but XML processing applications don't.

Systems that operate on XML documents can be thought of as processing pipelines for infosets. When a document is parsed, an infoset is created, which may then be validated against a schema, after which the infoset is augmented with type information. The resulting infoset is called the "post schema validation infoset" or PSVI. The infoset may then have an XSLT transform applied to it, finally being serialized back to XML. In this world, the XML documents we are all used to, angle brackets and all, become merely hosts for the propagation of infosets.

Thompson emphasized the usefulness of pipelines as originally implemented in Unix. Large systems can be composed from simple, modular subsystems in this fashion. Whereas Unix pipelines are "thin", passing only character streams, XML Infoset pipelines are "fat", as they pass structured data from process to process.

Concluding his talk, Thompson encouraged us to think of XML applications in terms of infosets and pipelines of infosets. He stressed the importance of using XML Schemas to facilitate mapping between data structures and XML documents, referring to efforts like the Schema Adjunct Framework.

Web Services -- Not As Mature As You'd Think

David Orchard from Jamcracker gave a talk on the world of web Services, XML, and UDDI. In a change from the usual breathless hype about the future of web services, Orchard began by promising to include some nay-saying as well as hype. Explaining that "web services" was simply the "sexy" name for XML over HTTP, he outlined the reasons why web services were taking off. Apart from the well-known advantages of XML, Orchard outlined the benefits of HTTP: a robust operational architecture; a simple, mature infrastructure; reliability and scalability; standards-based; offering "good enough" performance; and well-defined APIs for application programming (e.g. Java servlets).

Orchard described the general features of web service protocols and then moved on to UDDI, which is the new kid on the block since he last gave this talk about a year ago in New York. UDDI (Universal Description, Discovery and Integration) provides directory services for businesses offering web services. UDDI is unusual in the current environment of XML specification development: It's managed by a closed, independent consortium. The intention is ultimately to give the UDDI specifications to a standards body, but Orchard suggested that the reason it is closed to date is that their work is considered too changeable or unstable to invite public participation.

He also raised doubts about the current maturity of UDDI, and its ability to perform in real-world deployment situations. He highlighted its lack of distributed queries and a generic query syntax; the uncertainty about whether replication would scale; the lack of support for versioning; and, in particular, the weakness of the descriptive power of UDDI's tModel metadata structure. In summary, Orchard said that UDDI was an "interesting experiment" and had about a "50% chance of success."

Though a confirmed supporter of web services, Orchard also presented an honest view of the current web services world. He concluded by identifying some areas that need improvement.

  • Server programming model: whereas more established server technologies have defined server-side programming models (for instance, the Java Servlet interface), web services currently lack a standard API for binding language implementations to service requests. This interface also needs to solve problems such as storing session information or specifying whether the XML will be passed as DOM or a character stream.
  • Request in one document: Orchard said that mixing content and header/routing information in one document is problematic. When a header is mixed into the document, it makes it difficult to mix in custom content -- for instance, the content DTD might preclude the presence of header elements. Although he noted the work on SOAP attachments, Orchard commented that this needs further attention.
  • XML not intended for B2B: noting that XML was originally designed with "write once, view anywhere" in mind, Orchard complained that insufficient attention has been given to the use of XML on the server side. Only now, with the XML Protocols Activity, is the W3C starting to give attention to it. One comment he made was that XML needs something similar to Sun's J2EE label, a designation of a "Unified XML" which denotes support for a certain set of XML specifications. To say only that a product is XML-enable is largely meaningless.
  • Security: Little work has yet been done on security with technologies like SOAP. One problem in this space is how to pass a message body securely while retaining routing information. SOAP through port 80 also makes the firewall administrator's life more difficult. Orchard observed however that this seems to be inevitable.
  • Missing functionality: Finally, Orchard observed that there were many features from more mature infrastructures still missing. From the world of distributed objects he identified type-safety, discovery, versioning, service metadata, and object activation. From message-oriented-middleware he noted security, transactions, guaranteed delivery and asynchronous operation. Work is underway on some of these but isn't yet at the deployable stage.

The take-home message was that web services was indeed an exciting area of development, but that if you wanted to deploy a significant web service right now, there is quite a lot of infrastructure work you'll need to do for yourself.