Wrap Your App
November 21, 2001
Reporting on the most recent XML-DEV discussions, the XML-Deviant finds growing support for getting the "packaging problem" wrapped up once and for all.
There's been a lot of talk about interoperability on XML-DEV this week, a result of which has been to renew discussion of resource management in XML applications. An typical XML application might use several kinds of resources: one or more schemas, which might be in several different schema languages, entity files, CSS and XSLT stylesheets, and so on. Of course this isn't true of all applications, but it's fair to say that any reasonably complex application will use some, perhaps many kinds of external resource.
Managing these resources -- programmatically creating, storing, retrieving them -- will vary from application to application, although there's been effort in some areas to provide standard facilities. At the lowest level is entity management: entity files, DTDs, and so on, which can be managed using an appropriate catalog. We've covered this topic in the past, and support is slowly growing. The SAX developer mailing list has been the forum for some recent discussion on improvements to the SAX 2 API that should facilitate even better entity management and catalog support.
RDDL is another community effort at managing the resources that may, but need not, be associated with an XML Namespace URI. Unfortunately RDDL is still not seen as a definitive answer to the "What's At the End of a Namespace?" question, as continuing debate on the www-talk mailing list demonstrates. Application-level support for RDDL is also disappointingly thin on the ground, although it is being used by several projects to document their namespaces (e.g. Examplotron). RDDL is also obviously limited to describing namespace-related resources: there's no way to associate resources with XML vocabularies that don't use namespaces. To be clear, this isn't a RDDL restriction per se but reflects the absence of a standard way to make the association between the vocabulary and its resource directory.
While both of these efforts are moving in the right direction, they don't offer complete solutions. In the recent XML-DEV discussion, Tim Bray described the lack of resource management as a "horrible problem" and expressed surprise that it hasn't been resolved yet.
...the infrastructure has really lousy support for dealing with multiple related resourc[es] that you need to bring together to do a job. There are little bits & pieces of machinery around: multipart-mime, RDDL, etc. Interestingly, they tried to start a "packaging" activity up over at W3C but it expired for lack of interest.
I've been kind of surprised that there isn't more energy being pointed at this problem. Still am.
The community has discussed this problem before, as the Deviant has previously reported. At the time the W3C didn't have the resources to devote to a packaging effort so could only go as far as publishing the available background material and setting up a public mailing list in the hope that the community would work on the problem. Unfortunately the community didn't rally around the issue. Promisingly, discussions on XML-DEV this week shows that there is some renewed interest.
During the discussion, Rick Jelliffe clearly defined the current problem, acknowledging that while interoperability between XML applications can be good, the barrier to entry can still be high.
I believe the root of the problem is that there is no vendor-neutral way to distribute XML applications. We have all sorts of formats for bits and pieces: schemas, transformation scripts, stylesheets, digital rights, not to mention the zillion proprietary plug-in formats.
Yet there is no way I can say to my friend "Here is a file with all the resources needed for you to work with DOCBOOK: you can just plug it into your XML system (editor, composer, database, web application, etc. etc.) and you can start using itstraight away."
At the moment, we are little better than in SGML days: we have a choice of many more tools but they still take a too much effort to set up. Once set up, we have interoperability of data between our application and some-one elses, but the establishment costs are still too high. So a lot of the potential of standard generalized markup languages has not been realized yet, because of this inflexibility.
A pithier version of which is, How can we make it convenient to deploy XML? This is a useful perspective which highlights the importance of hiding the complexities of the XML application and the resources it requires in a way that is largely transparent to the end user, who, after all, just wants to do the job at hand. Using catalogs or RDDL is currently to be undertaken only by a developer, one well-versed in XML no less. This packaging format would fulfill a similar role to Java's Web Application Archive (WAR) and Enterprise Application Archive (EAR) formats.
Jelliffe, who has been particularly active in this thread, gave his view of the problems with current packaging efforts and presented a light-weight alternative (DZIP) which exhibits the following characteristics:
- Low tech and flexible
- Only aimed at "document types" or "document type applications" (i.e. metadata) not whole documents
- Provides space for integrators to add parts for different platforms
Jelliffe's goal is to provide a very simple packaging format that can be quickly assembled with current tools. A ZIP or JAR-based format meets both of these criteria. Gavin Nicol, giving the format the name XAR (XML Application Archive), outlined some of the advantages of using a JAR-based format.
The notion of XAR came from recognising that ZIP formats like this are the defacto standard for JAVA packaging, and they have lots of wonderful properties:
etc. etc. etc.
- The nicely avoid all the MIME nastiness with regards packing of textual content.
- They already support the notion of package-metadata via the META-INF directory.
- They can be compressed.
- They can be made secure, by, for example, encrypting all the ZIP content and then providing the public key in the META-INF folder.
I think this is something we (as a community) should heartily embrace.
While there are obvious advantages for Java developers, David Brownell noted that other platforms ought to be able to catch up very quickly. APIs for manipulating ZIP files being quite common.
Also in XML-Deviant
Several other formats were also suggested as possible candidates. James Clark noted that OpenOffice uses a ZIP based format for packaging up documents and also recommended taking a closer look at DIME, a packaging format that is used as part of the Web Services Routing Protocol. Clark commented that the DIME format is both very simple and well-designed. Henrik Frystyk Nielsen provided some further pointers and explained that DIME is being submitted as an Internet Draft. However, the lack of tool support as wide-spread as those for ZIP-based formats is likely to prevent DIME being as successful as an alternative.
Some preferred to break down the decision on a packaging format into two separate questions: the format for bundling the resources (ZIP seems to have the most popular support here) and the manifest format used to describe the contents of a bundle. Jun Fujisawa observed that the manifest is the most important component.
People tend to concern about which packaging method is best suited for packaging XML documents with related resources. I'd like to suggest that it is more important to have a common and standard manifest format (presumably specified in XML) which can be used combined with all of the above packaging method.
Along these lines, Jonathan Borden promoted the use of RDDL as a means for specifying the contents of a resource, while Garret Wilson proposed XPackage, an RDF and XLink based format currently under consideration as a generalised bundling mechanism for the Open EBook format. Wilson outlined the use cases for which the format had been defined.
Interest levels certainly seem high, and a series of proposals provides a pool from which a standard could emerge. The usual avenues are open to achieving this, but an OASIS Technical Committee seems to have the right characteristics of being sufficiently "official" to allow a format to gain traction, while allowing a rapid development cycle. However, as Rick Jelliffe observed, simply standardizing on a format isn't sufficient:
...even if there is an OASIS group that specs it, would vendors get on board? Or would it expose that while they are happy to have document exchange using XML being flexible, but not so happy if it is easy to change applications (and therefore to change products)? Actually, I don't believe that XML tools-vendors are at all happy that setting up applications can be such a major task, nor that they can be difficult to maintain (i.e. with XML one gets a lot of flexibility in the different products we can connect together, but once we have done the work to connect them it is still a big thing to change or update them.) But they would not be expected to move in this area until there is a level of user demand or until they reach the stage in their business where deployability and maintainability become important selling issues. The more that XML products are easy to configure and update, the more that they can be successful on the desktop as well as the back-end.
The lack of progress on packaging suggests that this user pressure has yet to be applied, or that the market is still not sufficiently mature for factors other than rote lists of specification support to become selling points. Packaging seems very low hanging fruit which the XML community could pluck quickly. Whether XAR shares the same fate as some other useful (but unglamorous) proposals and gets thrown on the discarded ideas pile will remain to be seen. But for XML to really become an invisible part of the infrastructure, these fundamental issues need resolving.