Namespaces in XML Adopted by W3C
The "Namespaces in XML" specification has been formally adopted by the W3C as a recommendation from its members and its director, Tim Berners-Lee, paving the way for vendors to create software that will more easily support the rich markup vocabularies made possible by XML.
What's the
problem?
XML is a language for
creating markup -- think of
it as the way to write down
new tag sets that your
software will recognize and
make use of. The whole
point of XML is to enable
users to be able to create
unique tags that identify
their information in more
meaningful ways than
simply applying the basic
set of HTML tags to all
documents.
While this gives users great flexibility, it poses problems for interchange and software integration. What happens when two documents make use of the same tag names in different contexts? For example, a <PART> tag in an illustrated parts catalog identifies something quite different than a part in a dramatic play. Within a single document, the term "title" may refer to the document itself, the name of a book, and the formal appellation associated with its author (e.g., "Dr."). The problem is not just for element names; it extends to attributes as well.
This potential collision over different uses of the same names poses problems for anyone writing XML- based software and applications. It?s difficult to write a style sheet that displays book titles in italic but gives no special formatting to people?s titles if I don?t have a good way to distinguish the two uses of the title tag.
A partial solution
The XML namespaces
spec addresses this issue by
allowing tags to have a
context. That context is the
tag or attributes? XML
namespace, which is simply
a Web address. Because
Web addresses are unique,
they?re a handy way to
establish unique contexts.
For example, you could
create a namespace called
EDI, linked to a URL;
declare that namespace at
the beginning of your XML
document; and then add
"EDI:" as a prefix to any
element name in the
document. The use of the
declared prefix provides a
way for software to treat
tags with EDI prefixes
differently than tags with
different, prefixes. You can
also declare a default
namespace at the start of
your document; any tags
without prefixes are
assumed to be in the default
namespace.
One thing the namespace does not define is what the tags or attributes are, or what they mean. That larger effort, which would enable DTD designers to reference public tag sets, is part of what the XML schema working group is tackling. Further details on how namespaces operate can be seen in our accompanying story, and in the specification itself.
Further information about upcoming developments of XML are available at the W3C Web site.
The implications
All along, it has been
envisioned that XML will be
applied to specific vertical
industries. Medicine has
patient records; legal
publishing has court cases
and commentary; aerospace
has its massive engines and
planes. Many have
presumed that XML
wouldn?t take off until
vertical industries published
their DTDs.
Public DTDs are useful, and ultimately we are likely to see more of them. In the meantime, though, some people writing their own XML applications (software acting on private DTDs) would like a way to at least resolve tag and attribute name conflicts. Knowing the meaning of the tags is required for true interchange, but for the purposes of creating style sheets for display, it may be enough to simply have a unique ID and make an educated guess about the meaning.
"Think of an invoice," suggests Dan Connolly, W3C XML Activity Lead. "Most of an invoice like the addresses and quantities and amounts are in regular commercial language. But maybe the descriptions of exactly what parts have been ordered would only be understood by experts manufacturing or using the parts. Still, many people can understand the invoice without having to understand what the part description means. XML namespaces allows a digitally coded document like this invoice to be processed -- without everyone who uses invoices having to agree on a vocabulary for turbojet engine side intake manifold monitor valve mounting nuts, or whatever."
With the namespace spec, what the W3C has said is that what is required is a standard way to resolve conflicts in tag names, so that programmers around the globe can work from a common understanding of how tags will be identified.
The XML namespace spec does provide this universal mechanism that everyone can use to create tag and attribute names that are unique in the context of specific documents, just as file names are unique in the context of the directory of a computer?s hard disk.
Though namespaces operate "under the hood" and are really aimed at programmers, they will enable a variety of functions -- from interesting typographic treatment of elements to sophisticated processing of orders and invoices -- that all manner of Web users, even novices, will appreciate.
|
January 14th saw the arrival of a new W3C Recommendation, Namespaces in XML. "Recommendation" is the final step in the W3C process; the status means that the document is done, frozen, agreed-upon and official.
Namespaces are a simple and straightforward way to distinguish names used in XML documents, no matter where they come from. However, the concepts are a bit abstract, and this specification has been causing some mental indigestion among those who read it. The best way to understand namespaces, as with many other things on the Web, is by example.
So let's set up a scenario: suppose XML.com wanted to start publishing reviews of XML books. We'd want to mark the info up with XML, of course, but we'd also like to use HTML to help beautify the display. Here's a tiny sample of what we might do:
<h:html xmlns:xdc="http://www.xml.com/books"
xmlns:h="http://www.w3.org/HTML/1998/html4">
<h:head><h:title>Book Review</h:title></h:head>
<h:body>
<xdc:bookreview>
<xdc:title>XML: A Primer</xdc:title>
<h:table>
<h:tr align="center">
<h:td>Author</h:td><h:td>Price</h:td>
<h:td>Pages</h:td><h:td>Date</h:td></h:tr>
<h:tr align="left">
<h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
<h:td><xdc:price>31.98</xdc:price></h:td>
<h:td><xdc:pages>352</xdc:pages></h:td>
<h:td><xdc:date>1998/01</xdc:date></h:td>
</h:tr>
</h:table>
</xdc:bookreview>
</h:body>
</h:html>
In this example, the elements prefixed with xdc are associated with a namespace whose name is http://www.xml.com/books, while those prefixed with h are associated with a namespace whose name is http://www.w3.org/HTML/1998/html4.
The prefixes are linked to the full names using the attributes on the top element whose names begin. xmlns:. The prefixes don't mean anything at all - they are just shorthand placeholders for the full names. Those full names, you will have noticed, are URLs, i.e. Web addresses. We'll get back to why that is and what those are the addresses of a bit further on.
But first, an obvious question: why do we need these things? They are there to help computer software do its job. For example, suppose you're a programmer working for XML.com and you want to write a program to look up the books at Amazon.com and make sure the prices are correct. Such lookups are quite easy, once you know the author and the title. The problem, of course, is that this document has XML.com's book-review tags and HTML tags all mixed up together, and you need to be sure that you're finding the book titles, not the HTML page titles.
The way you do this is to write your software to process the contents of <title> tags, but only when they're in the http://www.xml.com/books namespace. This is safe, because programmers who are not working for XML.com are not likely to be using that namespace.
Attributes, not just elements, can have namespaces. For example, let's use the HTML STYLE attribute to allow an HTML browser to display our book review:
<h:html xmlns:xdc="http://www.xml.com/books"
xmlns:h="http://www.w3.org/HTML/1998/html4">
<h:head><h:title>Book Review</h:title></h:head>
<h:body>
<xdc:bookreview>
<xdc:title h:style="font-family: sans-serif;">
XML: A Primer</xdc:title>
<h:table>
<h:tr align="center">
<h:td>Author</h:td><h:td>Price</h:td>
<h:td>Pages</h:td><h:td>Date</h:td></h:tr>
<h:tr align="left">
<h:td><xdc:author>Simon St. Laurent</xdc:author></h:td>
<h:td><xdc:price>31.98</xdc:price></h:td>
<h:td><xdc:pages>352</xdc:pages></h:td>
<h:td><xdc:date>1998/01</xdc:date></h:td>
</h:tr>
</h:table>
</xdc:bookreview>
</h:body>
</h:html>
That example above is, perhaps, kind of ugly, with all those prefixes and colons clutering up the tags. The Namespaces Recommendation allows you to declare a default namespace and leave out some prefixes, like this:
<html xmlns="http://www.w3.org/HTML/1998/html4"
xmlns:xdc="http://www.xml.com/books">
<head><title>Book Review</title></head>
<:body>
<xdc:bookreview>
<xdc:title>XML: A Primer</xdc:title>
<table>
<tr align="center">
<td>Author</td><td>Price</td>
<td>Pages</td><td>Date</td></tr>
<tr align="left">
<td><xdc:author>Simon St. Laurent</xdc:author></td>
<td><xdc:price>31.98</xdc:price></td>
<td><xdc:pages>352</xdc:pages></td>
<td><xdc:date>1998/01</xdc:date></td>
</tr>
</table>
</xdc:bookreview>
</body>
</html>
In this example, anything without a prefix is assumed to be in the http://www.w3.org/HTML/1998/html4 namespace, which we're using as the namespace name for HTML (presumably, now that namespaces are official, the W3C will give HTML an official namespace name).
One of the confusing things about all this is that namespace names are URLs; it's easy to assume that since they're Web addresses, they must be the address of something. They're not; these are URLs, but the namespace draft doesn't care what (if anything) they point at. Think about the example of the XML.com programmer looking for book titles; that works fine without the namespace name pointing at anything.
The reason that the W3C decided to use URLs as namespace names is that they contain domain names (e.g. www.xml.com), which work globally across the Internet.
That's more or less all there is to it. The only purpose of namespaces is to give programmers a helping hand, enabling them to process the tags and attributes they care about and ignore those that don't matter to them.
Quite a few people, after reading earlier drafts of the Namespace Recommendation, decided that namespaces were actually a facility for modular DTDs, or were trying to duplicate the function of SGML's "Architectural Forms". None of these theories are true. The only reason namespaces exist, once again, is to give elements and attributes programmer-friendly names that will be unique across the whole Internet.
Namespaces are a simple, straightforward, unglamorous piece of syntax. But they are crucial for the future of XML programming. Because this is important, we at XML.com will be soon be posting an Annotated Namespaces, in a style similar to our Annotated XML 1.0.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.