June 15, 2005
"In Straylight, the hull's inner surface is overgrown with a desperate proliferation of structures, forms flowing, interlocking, rising toward a solid core..." — William Gibson
To open with a quote like, "The great thing about standards is that there are so many to choose from," would be too cliché; besides, the gritty technocratic description from Neuromancer feels more appropriate to this week's discussion.
Most people would agree that some level of standardization is important, particularly around technology including XML, but yet in the same breath they'd express frustration at not being able to keep up with all the standards activity. At the W3C, standards-in-progress often receive little attention until the second or third "last call" drafts are released. In some circles, "standard" has even become a sort of epithet, something hurled as an insult. A similar problem in a related area might help show us the way to a solution.
The OSI Example
OSI, the Open Source Initiative, maintains among other things a list of approved open source licenses. The list has been a popular and useful feature; when companies or volunteers announce an open source project, for example, it's common for them to highlight their use of an OSI-approved license.
Unfortunately, the great success of this program has led to problems. It's easy for project leaders or legal staff to conclude that an existing license doesn't quite meet their unique and special needs. Or a corporate benefactor might like the PR value of having a license that includes the company name. Maybe they need to make just a few little tweaks to the legal language, which is called a "fork" when applied to either a license or the software underneath. Of course, then the result is no longer an OSI-approved license, so the slightly revised version would get submitted to OSI, for possible inclusion on the approved list. The larger that list gets, the less useful (and more daunting) it becomes. It's rare for a project to entirely fall under a single license, and different licenses interact with one other in confusing ways with possible legal ramifications. Proliferation, along with the combinatorial explosion of possible interactions, had officially become a "significant barrier to open source deployment," a problem which needed to be addressed.
OSI now has a page describing the background and current policy dealing with license proliferation. In summary, the new policy sets additional criteria for approving a license, and ranks existing licenses, possibly depreciating some. In the early years, OSI encouraged experimentation in the area of licensing; but now, new licenses submitted are weighted against additional checks:
The license must not be duplicative. The submitter must demonstrate that the license solves a problem not sufficiently addressed by an existing approved license.
The license must be clearly written, simple, and understandable.
The license must be reusable. Modularity is possible, even in legal documents, where the names of specific projects, copyright holders, and so on can be included by reference in an attachment.
The OSI effort has been paying off, as companies like Intel have been applauded for withdrawing their specific licenses in favor of more general ones. Can the same lessons apply to the vast field of standards development?
Parallels from XML-related Standards
As has been the case with licensing, existing standards often fragment due to repellent forces. It's easy for project leaders or technical staff to conclude that an existing standard doesn't meet their unique and special needs. Or company executives might prefer a situation where the company wields ultimate power over a standard that will become widely used. It's rare for a project to involve a single standard, and multiple standards react with one another in confusing ways with possible legal (and, perhaps, technical) ramifications.
There are a number of efforts underway that aim at beneficially reducing the number of XML standards in the universe. One such effort falls under the broad umbrella of microformats. Why create new specifications for outlines (or contact information, or bookmarks, or calendar information, or even syndication formats) when existing formats can be used, usually in subset, to accomplish all the same things?
In fact, a great deal of current work is targeted at proliferation reduction efforts. XML itself meets a set of requirements that might otherwise have been filled by smaller, incompatible, more specialized syntaxes. XHTML 2.0 can be thought of as kind of a universal format that will work well in many situations that might otherwise call for one-off vocabularies. Even when more specific elements are needed outside the official standard, the modularization format underlying XHTML provides a clean interface to do so. UBL, the Universal Business Language, and OASIS Open Document, as used by at least two office suites, are other examples. As is often the case, standards in this category tend to have names like "Extensible" and "Universal".
Other kinds of standardization work can be thought of as separative efforts, in effect saying, "The existing stuff isn't good enough, so we need this..." or, "Sometimes, general purpose is too general." The push for a binary variant of XML is a good example of this kind of thinking. Arguably, so are the XML formats recently announced for Microsoft Office 12, some of the work from the WHAT WG, and the alphabet soup of overlapping Web Services standards. Developers have learned to have healthy skepticism for these things. Just as we have seen with licensing, often the need for forking has been overstated, and attaching to current, ongoing work is the better strategy. For anyone evaluating the worth of a particular standard, I suggest that the same three criteria adopted by OSI make a good touchstone for a potential standard:
It shouldn't be duplicative of another more-established standard.
It should be clearly written and easy to understand.
It should reuse existing standards instead of reinventing things.
Even on these three points, various parties will come to differing conclusions. For example, strong supporters of binary XML will easily claim that their favorite proposal meets all three, while binary XML opponents will claim that it falls short of at least numbers 1 and 3. Other cases are just plain hard to classify. Any version 2.0 specification could be viewed as duplicative with earlier versions of the same specification. And the specific case of Relax NG is worth noting, since it does overlap somewhat with XML Schema, but nevertheless is gaining widespread adoption, including normative parts of the latest XHTML 2.0 draft.
The kinds of proliferation discussed this week are good problems to have, but serious problems still. With licensing, OSI serves a central organizing force able to make needed changes. The wide world of standardization, however, is decentralized enough to not have any player in a similar position.
Standards bodies exist in what is often an alien ecology. Ultimately, though, they either respond to the needs of standards users, including readers of this column, or else fade into obscurity. The criteria outlined here should give readers more tools by which to evaluate standards, and hopefully lead to some success stories like the ones coming out of OSI.
Births, Deaths, and Marriages
Quite a few announcements from the queue.
An XML editor and XSLT 1.0 and 2.0 debugger.
Topologi announces updates across their entire line of inexpensive XML utilities.
A W3C workshop to gather concrete reports and examine the full range of usability, implementation, and interoperability problems around the specification and its test suite.
New major release of this popular editor, including a graphical schema editor (for both XSD and Relax NG) and a new XML-diff engine.
Live, and ready for interested parties to start adding information.
New release forthcoming of the XML+CSS to PDF converter tool.
Deadline for late-breaking news is June 24.
Documents and Data
Python DOM compliance page, which Amelia Lewis pointed out.
Reusing XML Processing Code in non-XML Applications by Oleg Paraschenko
Robin Berjon's Experience with XSD