W3C XML Schema Made Simple
by Kohsuke Kawaguchi | Pages: 1, 2
If you haven't heard about notations, please be assured that you aren't missing anything. Notations exist only for backward compatibility. There is no need to learn about them.
If you do know notations, you should know that notations in W3C XML Schema are not compatible with notations in DTDs, because a Schema notation is a QName.
The following example is from the spec.
<xs:notation name="jpeg" public="image/jpeg" system="viewer.exe" /> <xs:element name="picture"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:hexBinary"> <xs:attribute name="pictype"> <xs:simpleType> <xs:restriction base="xs:NOTATION"> <xs:enumeration value="jpeg"/> <xs:enumeration value="png"/> ... </xs:restriction> </xs:simpleType> </xs:attribute> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <picture pictype="jpeg"> ... </picture>
This example is okay. But the following fragment is not valid even if the prefix "pic" is properly declared.
<pic:picture pictype="jpeg"> ... </pic:picture>
Confused? You have to write it as follows because it's a QName.
<pic:picture pictype="pic:jpeg"> ... </pic:picture>
Apparently it fails to serve its only reason for existing. There's really no reason to use notations. Notations are for SGML.
W3C XML Schema allows you to declare elements inside another element:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.com"> <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="familyName" type="xs:string" /> <xs:element name="firstName" type="xs:string" /> <xs:sequence> <xs:complexType> <xs:element> <xs:schema>
But generally you should avoid this if possible, because the above schema does not match the following instance:
<person xmlns="http://example.com"> <familyName> KAWAGUCHI <familyName> <firstName> Kohsuke <firstName> <person>
Rather, you have to write it as
<foo:person xmlns:foo="http://example.com"> <familyName> KAWAGUCHI <familyName> <firstName> Kohsuke <firstName> <foo:person>
Not only does this require more typing, it is also a bad use of XML Namespaces. To avoid this problem, you should write
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.com"> <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element ref="familyName" /> <xs:element ref="firstName" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="familyName" type="xs:string" /> <xs:element name="firstName" type="xs:string" /> </xs:schema>
Another way to solve the problem is to add
elementFormDefault="qualified" to the
element. You can then safely use local element declarations. It
probably isn't worth the effort to understand exactly what this
means. Just understand that it makes the schema behave in the "right"
In short, substitution groups are too complex to be a practical mechanism. There are two main difficulties.
- To write substitution groups correctly, you have to master the complex type, which is itself another beast that you should avoid.
- It is hard to tell which elements are actually substitutable.
Simply put, a substitution group is another way to write a
<choice>. So you can always use a
<choice> instead of a substitution group; and
<choice> is necessary anyway.
To use substitution groups properly, first you have to learn complex types, then several additional attributes, rules to use them, and finally the effect of using them. Even if you manage to get through this brave new world, your document authors still need to follow the same path all over again because otherwise they can't write documents properly. What a pity.
If you still think you want to use substitution groups, it's not as easy as you think.
First, the content model of substitution group members must be related to each other by type derivation. That means you cannot write content models freely. Soon you'll find yourself writing an abstract element as a substitution group head with a strange content model, just to maintain proper derivations between members. That's not right.
Second, attributes to control the substitution behavior are
difficult to use and understand. There is an attribute called
block, which is one of the attributes you use to control
the substitution group. There is another attribute called
final, which basically takes one of "extension",
"restriction", or "#all" as its value.
final may look irrelevant to the substitution group,
but the truth is that it's called "substitution group exclusions"
internally and, as its name suggests, it controls the behavior of the
substitution group. The internal name of the
attribute is "disallowed substitutions". Having trouble understanding
the difference? Yeah, me too. Actually, both are used to control the
substitution behavior, but in a different way.
The only way to prohibit the
substitution of element Y by element Z is to add
block="substitution" to Y. But even with the presence of
this attribute, it is not an error to have Z in the
substitution group of Y. It's just that you can never substitute Y
with Z in your documents.
Even worse, if Y designates yet another element X as its substitution group head ( X <- Y <- Z ), then it's okay to substitute X with Z.
All these things make it impractical to use a substitution group in the real world, although it may look harmless when you are experimenting. And that's why you should avoid it.
W3C XML Schema allows the
schema element without the
targetNamespace attribute. Some people call such schemas
chameleons. Why they are called that is irrelevant; what you
should know is to avoid them.
One reason is that it's highly likely that validators will have interoperability problems here. Another reason is that some people like to invent cool tricks by using a chameleon schema. But don't be fooled by those tricks; they are for schema hackers, not for ordinary good citizens.
Unfortunately, if you want to know exactly why you should avoid them, then you have to learn what they are. Consider the following chameleon schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- note that targetNamespace attribute is absent. --> <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element ref="familyName" /> <xs:element ref="firstName" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="familyName" type="xs:string"/> <xs:element name="firstName" type="xs:string"/> </xs:schema>
Then you write another schema file and include the one above by
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://example.com"> <xs:include schemaLocation="above.xsd" /> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element ref="person" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
It seems OK, but actually it isn't. Look at the red line. It looks
like a reference to the
familyName element. But it isn't.
Since this chameleon schema is included by a schema with
familyName element is in this namespace. So to refer to
this declaration, you have to rewrite the red line as
<xs:element ref="bp:familyName" xmlns:bp="http://example.com" />
Now what happens if you want to reuse this chameleon schema from a
schema whose target namespace is
answer: you can't.
As you can see, the sole merit of using the chameleon schema is gone.
Even worse, you can't detect this error in some validators because they think that those missing components may appear afterward.
There are many pitfalls in XML Schema that should be avoided, which will make your life easier because you'll have less to learn. And you won't lose the expressiveness of W3C XML Schema. Keep it simple and have a happy life!
- I'm clueless
2009-05-02 21:47:53 FlutterVertigo@gmail.com
- complex models
2009-05-02 21:27:10 FlutterVertigo@gmail.com
- Need some experience to understand
2004-09-13 06:53:50 ABR
- complex types
2003-09-22 10:34:50 Karanjit Siyan
- Any non-trivial schema design...
2002-04-23 16:31:49 Bruce Grant
- Ignorance is bliss!
2001-08-31 14:35:15 Cathy Gregory
- Too simple, too lazy
2001-06-12 15:13:22 Julio Andrade
- If your going to use something, why not use it right?
2001-06-12 14:14:11 Robert May