Namespace Nuances
I'm trying to validate an XML file. It uses XML namespaces, but I can't figure out how to express them inside the DTD. Here's a sample XML document:
<?xml version="1.0"?>
<!DOCTYPE checkbook SYSTEM "checkbook.dtd">
<checkbook xmlns:f="http://schemas.ar-ent.net/soap/file/"
xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:m="http://schemas.ar-ent.net/test/soap.tr/checkbook/"
xmlns:ars="http://schemas.ar-ent.net/soap/">
<f:deposit type="direct-deposit">
<payor>Bob's Bolts</payor>
<amount>987.32</amount>
<date>21-6-00</date>
<description category="income">Paycheck</description>
</f:deposit>
</checkbook>
And here's a portion of the DTD, covering the markup included above,
<!ELEMENT checkbook (deposit|payment)*>
<!ELEMENT deposit (payor, amount, date, description?)>
<!ATTLIST deposit
type (cash|check|direct-deposit|transfer) #REQUIRED>
<!ELEMENT amount (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT payor (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description
category (cash|entertainment|food|income|work) 'food'>
And this is error message I'm getting:
Unknown element 'f:deposit'
A: I can see why you're frustrated. It certainly appears as though you've got everything accounted for in your document, with all those namespace declarations.
To help resolve your problem, let's review the basics of namespaces and their declarations, covering only what you need to know to deal with this particular problem. (A terrific resource for all kinds of questions about namespaces is Ron Bourret's XML Namespaces FAQ.)
Namespaces enable you to mix, in one XML document, element (and sometimes attribute) names from more than one XML vocabulary. Let's assume you've got a document in some vocabulary which looks, in part, like the following:
<furniture>
<table material="mahogany" type="dining"/>
<chair material="mahogany" type="dining"/>
<chair material="mahogany" type="dining"/>
<lamp material="brass" type="chandelier"/>
</furniture>
You show this to your boss at the furniture warehouse. He's not exactly the brightest bulb in the lamp, but he is your boss, and he says, "Well, that's okay. But I really want you to enclose all those individual types of furniture in a table."
"A table?" you ask.
"Sure. Like in a Web page. Rows and columns. A table."
What he's looking for, in short, would be something like this:
<furniture>
<table>
<tr>
<td><table material="mahogany"
type="dining"/></td>
<td><chair material="mahogany"
type="dining"/></td>
<td><chair material="mahogany"
type="dining"/></td>
<td><lamp material="brass"
type="chandelier"/></td>
</tr>
</table>
</furniture>
See the problem? You've got two element types, representing two
different things, both called table. You need to
disambiguate the names -- that is, make it clear which kind of
table element you're referring to at any point in the
document.
|
Also in XML Q&A | |
To declare a namespace is to declare which vocabulary an element
name comes from. The specific device for doing so is a special
attribute, xmlns, which can be placed on any element in a
document. This attribute takes the form:
xmlns:prefix="namespaceURI"
This is what we refer to when we speak of a "namespace declaration."
As for the pieces of the declaration:
xmlns is required; it identifies this as an XML
namespace declaration.:prefix (note the leading colon) is optional. If you
include it, all element names in the document from the indicated namespace
(vocabulary) must be prefixed with these characters, followed by a colon.
(We'll see examples in a moment.)namespaceURI is required. It uniquely identifies a
namespace in this document and perhaps in others. The term
"URI" here is a little misleading; although it looks like a
familiar Web URI, it needn't actually "point to" anything in
particular. Even many of the standard W3C namespace URIs don't
locate a document, like a DTD, which formally describes the
vocabulary. The important feature of the namespace URI is that it
be unique among all namespace URIs in the document.For the furniture example above, you could do something like the following (changes in boldface):
<furn:furniture xmlns:furn="http://myfurn/namespace"
xmlns="http://www.w3.org/1999/xhtml">
<table>
<tr>
<td><furn:table material="mahogany"
type="dining"/></td>
<td><furn:chair material="mahogany"
type="dining"/></td>
<td><furn:chair material="mahogany"
type="dining"/></td>
<td><furn:lamp material="brass"
type="chandelier"/></td>
</tr>
</table>
</furn:furniture>
Now the furniture element and all its descendants will
"know about" the furn namespace prefix. (Namespaces, like
special attributes such as xml:lang, are in effect for
the scope of whatever element declares them, unless redefined by some
descendant.) The second namespace declaration asserts that all
unprefixed element names in the document also come from a
particular namespace (XHTML 1.0, in this case). Consequently, any
namespace-aware application which processes this document will
recognize two distinct types of table element in this
document: one from the furniture-related vocabulary and one from XHTML
1.0.
|
You may have picked up on a couple of odd, unexplained features of the preceding document.
Once you start using namespaces in a particular document, you must
commit to going the whole way. In theory, only the names of the two
table element types needed to be disambiguated. In
practice, though, you use namespaces to disambiguate entire
vocabularies -- even the names of elements, like td and
chair above, which are already unambiguous. Thus, if you
decide to require that the furniture-type table have a
furn: prefix, you're committed to using that prefix on
the names of the furniture, chair, and
lamp elements as well.
As previously noted, a particular namespace prefix's associated URI
need not be the "web address" of anything in particular. There is
nothing at the URI http://myfurn/namespace (except a
"document not found" error). On the other hand, there's definitely
something at the
http://www.w3.org/1999/xhtml URI associated with the
"empty namespace prefix." (I'll leave for you the exercise of
inspecting that "something.")
But the above document introduces some more profound questions.
The first mystery is that the document above no longer contains
two distinct elements named table. It now contains a
table element, and a furn:table element. The
prefix is part of the element name.
The second mystery is the real killer, and it's the reason why the
original questioner is having trouble with the checkbook
application. If you mix element types from two different vocabularies,
how can you possibly validate a document at all, given that a valid
document may contain no more than one DOCTYPE
declaration, referencing no more than one DTD?
The answer is weird but also (once you think about it) obvious. Either (a) you can't validate it at all, or (b) you can validate it only if you include, in the one referenced DTD, all element names -- including their prefixes and all namespace-declaring attributes.
Case (a) isn't as outlandish an option as you might imagine. It's
one of the most common solutions, thanks in part to XSLT's
popularity. An XSLT style sheet must contain elements from the XSLT
vocabulary, such as xsl:stylesheet and
xsl:template, and these are intermingled in the
stylesheet with elements from the result tree vocabulary. Validating
an XSLT style sheet is a remote -- but only remote -- possibility.
The whole thing works wonderfully using the simpler alternative of
well-formedness.
(For some reason, case (a) seems to drive many otherwise sane users of XML absolutely batty: "If I can't validate a document, how do I know it's correct?" This has never bothered me because in terms of XML 1.0 well-formedness is just as "correct" as validity. If a document works in an application that needs to use the document, who cares if it works in the framework of some other arbitrary application -- like a validating parser?)
|
Also in XML Q&A | |
For starters, the XML document with which this whole discussion
opened is a little strange -- given what you now know about
namespaces. Its root element, checkbook, declares four
namespace prefixes and their associated URIs: f,
s, m, and ars. Of these, only
one is actually used anywhere in the document: f, on the
f:deposit element. Furthermore, there is no
namespace declaration for the "empty prefix" -- which is actually, by
default, implicit in the names of all other elements in the document
(amount, date, and so on).
Let's assume that validation must be achieved somehow, that simple
well-formedness won't suffice. Let's also assume that the original
document is a fragment of a more complete one, which actually does at
some point need to use the s, m, and
ars prefixes as well as f. Here's how the
fragment of a DTD above, way back at the beginning, could be modified
to accommodate both validation and namespaces.
<!ELEMENT checkbook (f:deposit|payment)*>
<!ATTLIST checkbook
xmlns:f CDATA #FIXED "http://schemas.ar-ent.net/soap/file/"
xmlns:s CDATA #FIXED "http://schemas.xmlsoap.org/soap/envelope/"
xmlns:m CDATA #FIXED "http://schemas.ar-ent.net/test/soap.tr/checkbook/"
xmlns:ars CDATA #FIXED "http://schemas.ar-ent.net/soap/"
xmlns CDATA #FIXED "http://mycheckbookURI">
<!ELEMENT f:deposit (payor, amount, date, description?)>
<!ATTLIST f:deposit
type (cash|check|direct-deposit|transfer) #REQUIRED>
<!ELEMENT amount (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT payor (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description
category (cash|entertainment|food|income|work) 'food'>
Now your application will find an element named
f:deposit in the DTD, whereas before the DTD declared
only an element named deposit (no prefix). And now the
rest of the document can use any of the four explicit prefixes on any
element name, as long as those names, including
prefixes, are declared in the DTD. If an element named
s:envelope appears in the document, an element named
s:envelope must be declared in the DTD. A declaration for
a simple envelope element won't suffice.
Simple? Probably not. Possible? You bet.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.