October 2, 1997

Rohit Khare and Dan Connolly


The hypertext markup language is an SGML format.

The result of that design decision is something of a collision between the World Wide Web development community and the SGML community--between the quick-and-dirty software community and the formal ISO standards community. It also creates a collision between the interactive, online hypermedia technology and the bulk, batch print publications technology.

XML: Principles, Tools, and Techniques heralds the appearance of Extensible Markup Language (XML), a simple, powerful subset of Standard General Markup Language (SGML). In this issue of the World Wide Web Journal you'll find the complete technical specifications, primers, implementation case studies, applications, and even historical and philosophical reflections on the emerging role of XML.

A Splendid Inheritance

SGML is a complex, mature, stable technology. The international standard, ISO 8879: 1986, is over ten years old, and GML-based systems were established years earlier still. On the other hand, HTML is a relatively simple, new, and rapidly evolving technology.

When Tim Berners-Lee designed HTML, he chose to base it on SGML because SGML was an open, extensible technology that facilitated sound information management techniques. He knew extensibility was crucial because data formats on the Web would have to evolve as the system grew and changed. And openness was critical because he didn't want any single company or group to be able to prevent good ideas from other places from taking off.

Anyone who is familiar with HTML knows that its evolution has been far from graceful. What Tim didn't realize was that SGML was so complex and obscure that developers would guess what the standard said rather than looking it up--and they wouldn't always guess right.

The result is that our hard-won interoperability is not based on an open specification, but by the costly, primitive black art of reverse-engineering.

Another result is that today's Web software does not benefit from the extensibility of SGML: the market is full of specialized tools and applications that add tags to HTML for specific purposes--but the Web infrastructure as a whole does not accommodate these extensions.

XML is destined to remedy all that. It is a clean, simple dialect of SGML that developers can understand and implement consistently, and it provides extensibility--room to grow--beyond the centrally maintained set of HTML tags.

"Make the easy things easy and the hard things possible," is an established maxim in computer language design. HTML is successful in making the easy things easy, but if you have something that can't be done with the existing idioms in HTML, you might have to develop a completely new browser from scratch. Component technologies like plug-ins and Java are lowering the bar somewhat, true. But with the deployment of XML and stylesheets, the option of developing your own look and even your own document structures makes it easy enough to appeal to just about any information provider who is willing to tinker around.

Punctuated Equilibrium

XML is the product of careful evolution. Contradictory as that may seem, that is W3C's mission: "Leading the Evolution of the Web." Unlike the invisible hand of Nature, red in tooth and claw, W3C's technical staff takes an active role in cultivating the garden of Web technologies. Years spent nurturing the idea of Generic SGML are about to pay off in an explosion of new species of documents and unleash new applications to roam the Earth. But will this punctuation in the equilibrium of Web evolution drive its predecessors to extinction?

Beyond the engineering details of new Web technology, you can look to the Web Journal to provide context about the community behind the ideas and the broader role of standards bodies and industrial developments. In the first section of this book, we present a round table interview with members of the W3C design committees behind XML and lively first-person argument from David Siegel on the virtues of structured information. In "W3C Reports," you'll also hear about Mathematical Markup Language, a flagship XML application, as well as the Document Object Model's hooks for client-side scripting to animate those elements.

In "Technical Papers," several contributors posit how XML is a natural complement to Java and Web automation technologies. In the same way that URLs are shared technology for pointing to and accessing resources in distributed applications, XML provides an interchange format with extremely broad appeal. Each time a common programming task is institutionalized as shared technology, it lowers the cost of applications development and increases efficiency and innovation.

This is not to say that HTML will fade away: not everyone wants to develop a new document structure, stylesheet, or Java applet just to put up a Web page. HTML will always be there making the easy things easy. But with XML, if you want to go beyond the boundaries of HTML, it will be straightforward to do just that.


The XML specification developers have taken on quite a challenge, and delivered specifications that work. But even they will tell you that the specification by itself is not sufficient to establish shared understanding in the community. Most of them are probably at a trade show or conference right now, teaching the basics or the nuances of XML to a few more people. They know well that another critical step to shared understanding is running code that developers can see, touch, run, use, and generally get their heads around. Some of their code is in this issue. Shared understanding in the community also involves an appreciation for the motivation, history, and rationale for the technology. I think we've captured some of that here too.

We'd like to thank those that made it possible for so many in the Web community--ourselves included--to study SGML and structured documents via the Internet by releasing software and documentation, maintaining ftp archives, hypertext bibliographies, and answering countless questions in USENET forums: Robin Cover, Erik Naggum, Dave Hollander, Conleth O'Connell, Darrell Raymond, Eliot Kimber, Lou Burnard, C.M. Sperberg-McQueen, and James Clark. And of course, a tip o' the hat to the dedicated O'Reilly staff for riding this bronco.

Dan Connolly, Austin, TX
Rohit Khare, Ellicott City, MD
September 19, 1997