The Debut of XML:Geek

August 28, 1998

Peter Murray-Rust

The Debut of XML:Geek is proud to welcome our XML:Geek columnist, Peter Murray-Rust who wrote the JUMBO XML parser and co-manages the XML developer's mailing list (XML-DEV).


After many years' agonized hacking I now believe that the information revolution has finally arrived. We now have the tools that we need to express our creativity with reasonable chance of reaching 'most' places on the planet. I am very excited at the opportunity that this column gives me to try out some of those ideas and enthuse you with the real potential of XML.  As the weeks go by, the style of the column will mutate, particularly in response to your views.

Doing Something New with XML

This column - XML:geek - is dedicated to those of you who want to do something new and innovative on the Internet.  An XML:geek has a message waiting to be spread, and we believe that XML can make it happen. It's not effortless to do this - creativity isn't - but it should be very exciting. XML:geeks are primarily motivated by communicating and XML is the next logical phase after the success of HTML.

This column is not 'what is XML?' but 'how do I make XML do something'. In the next few issues I'll give ideas and examples of how to make the existing components work for you. You will get your hands dirty with pointy brackets and may also need to do some coding. However there are lots of great tools available for free and many more are coming so XML:geekdom will often be gluing these together.

I believe that XML will generate the same sort of worldwide community as was spawned in 1993 and 1994 and last longer. It'll be based on freeware, using existing programs and allowing your own to be used and modified. It'll have the same ethos as GNU, Perl, tcl, Python, LaTeX, Linux and many more. A place where geeks read lists, create communal cores and kernels and bolt new and exciting things onto them. These free XML systems will certainly be co-distributed with Linux, LaTeX, Perl and others. 

Why not just wait for the level 5 browsers? Fine, if you want others to do the thinking and the implementation. Word, Excel, Office, drawing packages, mailers, spellcheckers - all these will soon emit and consume XML - to everyone's benefit. But geeks are creators, not consumers and the opportunities are enormous. So this column will be about how geekdom can create its own browser.

Things Are Getting Better

Why am I so optimistic? One reason is that there is a real synergy with progress in other WWW tools. Three years ago I had nightmares trying to distribute an SGML system for chemistry. Six SGML files, an executable for every different platform, libraries, makefiles, scripts, etc. Now it's written in Java, packed into a jar or other installer and can be sent to any modern platform anywhere. So XML:geeks don't have to carry the burden of multiple files, platforms, OS's, etc. Also, you have to write less from scratch (assuming you can't/won't pay for software). Unlike C++ - where you had to make your own bricks - Java provides an enormous class library for almost all common operations. Interoperability is now the buzzword, so everything is much more likely to bolt together. Help is freely offered on many lists.

So what's so exciting? XML provides a data structure that is ideal for a huge number of problems. Maybe relational databases were a necessary step in evolution but they aren't easy for many people to understand. Many common objects map poorly onto relational models, making the convoluted tables difficult. In contrast, XML maps hierarchies and links. Hierarchies appear to be a fundamental way of human thought, manifested in countless classifications and taxonomies. Simply by writing an XML file, you are structuring information. Links are indefinitely flexible and can represent any data structure.

HTML dealt with human2human communication through text and graphics and it is superb for these processes. But it's lousy for conveying structure, useless for identifying what things are and it's almost certain to break for machine2machine problems. That's a problem XML can solve and what I hope to explore in XML:geek.

XML came out of SGML and SGML was primarily aimed at text-processing. So there's naturally an emphasis on text in the early specs for XML (xml:lang, xml:space, DocumentObjectModel, XSL, etc. are all text-oriented). But XML is just as good at non-textual applications such as:

  • mathematics (MathML)
  • machine services (WIDL)
  • log files (XLF)
  • push technology (CDF)
  • data (XML-data and DCD)
  • semantic networks (RDF)
  • chemistry (CML)
  • biology (BSML)
  • graphics (VML, PGML, etc.)
  • multimedia (SMIL)
  • music, genealogy, terminology, menus, taxonomy, etc...

So the question we'll ask on XML:geek is 'how can I do something fundamentally new with XML? and where can I get the tools and components to help?'.

There's No FedEx to Geekdom

You won't become an XML:geek overnight. Books like 'XML in 7 days' are meaningless in geek philosophy. It took me years to 'get under the skin of' SGML and then XML. Understanding will grow slowly but steadily. Here's how to start.

Get the coffee in. Cancel the trans-Saharan expedition. Be prepared to lose friends. Buy books if you like reading books, but be aware that no book can really give the grand picture.

Get software. It's all free, especially if you're prepared to run Java. Most software comes with demos. Run them till you really understand what happening.

Read the specs. There are geeks who can learn everything from a spec - I'm not one. But much of what you need is in the spec. They are sometimes dense. Often what is not said is important. The worst ignominy for a geek is asking an elementary question and being referred to the spec. This is especially true for asking about XML WFness or validity. Remember the parser writer. They had to understand the spec in detail and all that knowledge is in the parser. So: 'is this legal?' - try it. 'What does X do?' - create some examples and look at the output.

Many people ask: 'where can I get an XML browser?'.  XML can and will be used to present text-based documents in many renderings and for many tag sets. The major current browsers will certainly cater for this, using mixtures of HTML and XML, with stylesheets and metadata. But XML has so much more power that there will be many types of 'browser'.  My own efforts are at creating an element-based browser - JUMBO - which can load specialized Java classes for any element in a document. XML can be used in a declarative manner to act as programs do, performing menu creation or presenting action lists. XML will be much more interactive and distributed than HTML and there will be a lot more to choose from.

If you are still reading, you're geek material. The only two URLs you need (other than are:

XML geeks love organizing information so in these you'll find easy navigation and very little duplication of resources. There are separate sites for software, examples, history, recent news, discussion, etc. [Henry Rzepa and I run the XML Developers' list - we keep it focused on actually creating things so maybe we'll see you there. Other lists are more discursive, introductory, etc. Many of them welcome beginners' questions - XML-DEV is not really suited for that.]

I'd like feedback. Be prepared for ideas to be shared and mutated, so don't feel possessive. XML is a communal publishing system of enormous power and will be seen as revolutionary by many. If I'm right about XML, we can generate radical new stuff right here. But please don't mail with 'where can I get X?', 'what's a good book?', 'I can't get X to run under OS Y'.

So - install one (or more than one) parser. Try out the DOM, XSL, RDF. Remember that these are still experimental. Make your XML ring bells, flash lights, drive model trains. Interface it to LaTeX or Linux. Have fun. Next issue will get geekier.