Menu

Namespaces in XML Adopted by W3C

January 19, 1999

Mark Walter

The "Namespaces in XML" specification has been formally adopted by the W3C as a recommendation from its members and its director, Tim Berners-Lee, paving the way for vendors to create software that will more easily support the rich markup vocabularies made possible by XML.

What's the problem?
XML is a language for creating markup -- think of it as the way to write down new tag sets that your software will recognize and make use of. The whole point of XML is to enable users to be able to create unique tags that identify their information in more meaningful ways than simply applying the basic set of HTML tags to all documents.

While this gives users great flexibility, it poses problems for interchange and software integration. What happens when two documents make use of the same tag names in different contexts? For example, a <PART> tag in an illustrated parts catalog identifies something quite different than a part in a dramatic play. Within a single document, the term "title" may refer to the document itself, the name of a book, and the formal appellation associated with its author (e.g., "Dr."). The problem is not just for element names; it extends to attributes as well.

This potential collision over different uses of the same names poses problems for anyone writing XML- based software and applications. It?s difficult to write a style sheet that displays book titles in italic but gives no special formatting to people?s titles if I don?t have a good way to distinguish the two uses of the title tag.

A partial solution
The XML namespaces spec addresses this issue by allowing tags to have a context. That context is the tag or attributes? XML namespace, which is simply a Web address. Because Web addresses are unique, they?re a handy way to establish unique contexts. For example, you could create a namespace called EDI, linked to a URL; declare that namespace at the beginning of your XML document; and then add "EDI:" as a prefix to any element name in the document. The use of the declared prefix provides a way for software to treat tags with EDI prefixes differently than tags with different, prefixes. You can also declare a default namespace at the start of your document; any tags without prefixes are assumed to be in the default namespace.

One thing the namespace does not define is what the tags or attributes are, or what they mean. That larger effort, which would enable DTD designers to reference public tag sets, is part of what the XML schema working group is tackling. Further details on how namespaces operate can be seen in our accompanying story, and in the specification itself.

Further information about upcoming developments of XML are available at the W3C Web site.

The implications
All along, it has been envisioned that XML will be applied to specific vertical industries. Medicine has patient records; legal publishing has court cases and commentary; aerospace has its massive engines and planes. Many have presumed that XML wouldn?t take off until vertical industries published their DTDs.

Public DTDs are useful, and ultimately we are likely to see more of them. In the meantime, though, some people writing their own XML applications (software acting on private DTDs) would like a way to at least resolve tag and attribute name conflicts. Knowing the meaning of the tags is required for true interchange, but for the purposes of creating style sheets for display, it may be enough to simply have a unique ID and make an educated guess about the meaning.

"Think of an invoice," suggests Dan Connolly, W3C XML Activity Lead. "Most of an invoice like the addresses and quantities and amounts are in regular commercial language. But maybe the descriptions of exactly what parts have been ordered would only be understood by experts manufacturing or using the parts. Still, many people can understand the invoice without having to understand what the part description means. XML namespaces allows a digitally coded document like this invoice to be processed -- without everyone who uses invoices having to agree on a vocabulary for turbojet engine side intake manifold monitor valve mounting nuts, or whatever."

With the namespace spec, what the W3C has said is that what is required is a standard way to resolve conflicts in tag names, so that programmers around the globe can work from a common understanding of how tags will be identified.

The XML namespace spec does provide this universal mechanism that everyone can use to create tag and attribute names that are unique in the context of specific documents, just as file names are unique in the context of the directory of a computer?s hard disk.

Though namespaces operate "under the hood" and are really aimed at programmers, they will enable a variety of functions -- from interesting typographic treatment of elements to sophisticated processing of orders and invoices -- that all manner of Web users, even novices, will appreciate.

XML Namespaces by Example

January 14th saw the arrival of a new W3C Recommendation Namespaces in XML. "Recommendation" is the final step in the W3C process; the status means that the document is done, frozen, agreed-upon and official.

Namespaces are a simple and straightforward way to distinguish names used in XML documents, no matter where they come from. However, the concepts are a bit abstract, and this specification has been causing some mental indigestion among those who read it. The best way to understand namespaces, as with many other things on the Web, is by example.

So let's set up a scenario: suppose XML.com wanted to start publishing reviews of XML books. We'd want to mark the info up with XML, of course, but we'd also like to use HTML to help beautify the display. Here's a tiny sample of what we might do:


<h:html xmlns:xdc="http://www.xml.com/books"

        xmlns:h="http://www.w3.org/HTML/1998/html4">

 <h:head><h:title>Book Review</h:title></h:head>

 <h:body>

  <xdc:bookreview>

   <xdc:title>XML: A Primer</xdc:title>

   <h:table>

    <h:tr align="center">

     <h:td>Author</h:td><h:td>Price</h:td>

     <h:td>Pages</h:td><h:td>Date</h:td></h:tr>

    <h:tr align="left">

     <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>

     <h:td><xdc:price>31.98</xdc:price></h:td>

     <h:td><xdc:pages>352</xdc:pages></h:td>

     <h:td><xdc:date>1998/01</xdc:date></h:td>

    </h:tr>

   </h:table>

  </xdc:bookreview>

 </h:body>

</h:html>

In this example, the elements prefixed with xdc are associated with a namespace whose name is http://www.xml.com/books, while those prefixed with h are associated with a namespace whose name is http://www.w3.org/HTML/1998/html4.

The prefixes are linked to the full names using the attributes on the top element whose names begin. xmlns:. The prefixes don't mean anything at all - they are just shorthand placeholders for the full names. Those full names, you will have noticed, are URLs, i.e. Web addresses. We'll get back to why that is and what those are the addresses of a bit further on.

Why Namespaces?

But first, an obvious question: why do we need these things? They are there to help computer software do its job. For example, suppose you're a programmer working for XML.com and you want to write a program to look up the books at Amazon.com and make sure the prices are correct. Such lookups are quite easy, once you know the author and the title. The problem, of course, is that this document has XML.com's book-review tags and HTML tags all mixed up together, and you need to be sure that you're finding the book titles, not the HTML page titles.

The way you do this is to write your software to process the contents of <title> tags, but only when they're in the http://www.xml.com/books namespace. This is safe, because programmers who are not working for XML.com are not likely to be using that namespace.

Attributes Too

Attributes, not just elements, can have namespaces. For example, let's use the HTML STYLE attribute to allow an HTML browser to display our book review:


<h:html xmlns:xdc="http://www.xml.com/books"

        xmlns:h="http://www.w3.org/HTML/1998/html4">

 <h:head><h:title>Book Review</h:title></h:head>

 <h:body>

  <xdc:bookreview>

   <xdc:title h:style="font-family: sans-serif;">

     XML: A Primer</xdc:title>

   <h:table>

    <h:tr align="center">

     <h:td>Author</h:td><h:td>Price</h:td>

     <h:td>Pages</h:td><h:td>Date</h:td></h:tr>

    <h:tr align="left">

     <h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>

     <h:td><xdc:price>31.98</xdc:price></h:td>

     <h:td><xdc:pages>352</xdc:pages></h:td>

     <h:td><xdc:date>1998/01</xdc:date></h:td>

    </h:tr>

   </h:table>

  </xdc:bookreview>

 </h:body>

</h:html>

Beautification

That example above is, perhaps, kind of ugly, with all those prefixes and colons clutering up the tags. The Namespaces Recommendation allows you to declare a default namespace and leave out some prefixes, like this:


<html xmlns="http://www.w3.org/HTML/1998/html4"

      xmlns:xdc="http://www.xml.com/books">

 <head><title>Book Review</title></head>

 <:body>

  <xdc:bookreview>

   <xdc:title>XML: A Primer</xdc:title>

   <table>

    <tr align="center">

     <td>Author</td><td>Price</td>

     <td>Pages</td><td>Date</td></tr>

    <tr align="left">

     <td><xdc:author>Simon St. Laurent</xdc:author></td>

     <td><xdc:price>31.98</xdc:price></td>

     <td><xdc:pages>352</xdc:pages></td>

     <td><xdc:date>1998/01</xdc:date></td>

    </tr>

   </table>

  </xdc:bookreview>

 </body>

</html>

In this example, anything without a prefix is assumed to be in the http://www.w3.org/HTML/1998/html4 namespace, which we're using as the namespace name for HTML (presumably, now that namespaces are official, the W3C will give HTML an official namespace name).

What Do Namespace Names Point At?

One of the confusing things about all this is that namespace names are URLs; it's easy to assume that since they're Web addresses, they must be the address of something. They're not; these are URLs, but the namespace draft doesn't care what (if anything) they point at. Think about the example of the XML.com programmer looking for book titles; that works fine without the namespace name pointing at anything.

The reason that the W3C decided to use URLs as namespace names is that they contain domain names (e.g. www.xml.com), which work globally across the Internet.

Is That All There Is?

That's more or less all there is to it. The only purpose of namespaces is to give programmers a helping hand, enabling them to process the tags and attributes they care about and ignore those that don't matter to them.

Quite a few people, after reading earlier drafts of the Namespace Recommendation, decided that namespaces were actually a facility for modular DTDs, or were trying to duplicate the function of SGML's "Architectural Forms". None of these theories are true. The only reason namespaces exist, once again, is to give elements and attributes programmer-friendly names that will be unique across the whole Internet.

Namespaces are a simple, straightforward, unglamorous piece of syntax. But they are crucial for the future of XML programming. Because this is important, we at XML.com will be soon be posting an Annotated Namespaces, in a style similar to our Annotated XML 1.0.