XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

W3C XML Schema Made Simple
by Kohsuke Kawaguchi | Pages: 1, 2

Why You Should Avoid Notation Declarations

If you haven't heard about notations, please be assured that you aren't missing anything. Notations exist only for backward compatibility. There is no need to learn about them.

If you do know notations, you should know that notations in W3C XML Schema are not compatible with notations in DTDs, because a Schema notation is a QName.

The following example is from the spec.

<xs:notation name="jpeg"
             public="image/jpeg" system="viewer.exe" />

<xs:element name="picture">
 <xs:complexType>
  <xs:simpleContent>
   <xs:extension base="xs:hexBinary">
    <xs:attribute name="pictype">
     <xs:simpleType>
      <xs:restriction base="xs:NOTATION">
       <xs:enumeration value="jpeg"/>
       <xs:enumeration value="png"/>
        ...
      </xs:restriction>
     </xs:simpleType>
    </xs:attribute>
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>
</xs:element>

<picture pictype="jpeg"> ... </picture>

This example is okay. But the following fragment is not valid even if the prefix "pic" is properly declared.

<pic:picture pictype="jpeg"> ... </pic:picture>

Confused? You have to write it as follows because it's a QName.

<pic:picture pictype="pic:jpeg"> ... </pic:picture>

Apparently it fails to serve its only reason for existing. There's really no reason to use notations. Notations are for SGML.

Why You Should Avoid Local Declarations

W3C XML Schema allows you to declare elements inside another element:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="familyName" type="xs:string" />
        <xs:element name="firstName" type="xs:string" />
      <xs:sequence>
    <xs:complexType>
  <xs:element>
<xs:schema>

But generally you should avoid this if possible, because the above schema does not match the following instance:

<person xmlns="http://example.com">
  <familyName> KAWAGUCHI <familyName>
  <firstName> Kohsuke <firstName>
<person>

Rather, you have to write it as

<foo:person xmlns:foo="http://example.com">
  <familyName> KAWAGUCHI <familyName>
  <firstName> Kohsuke <firstName>
<foo:person>

Not only does this require more typing, it is also a bad use of XML Namespaces. To avoid this problem, you should write

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="familyName" />
        <xs:element ref="firstName" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="familyName" type="xs:string" />
  <xs:element name="firstName" type="xs:string" />
</xs:schema>

Another way to solve the problem is to add elementFormDefault="qualified" to the schema element. You can then safely use local element declarations. It probably isn't worth the effort to understand exactly what this means. Just understand that it makes the schema behave in the "right" way.

Why You Should Avoid Substitution Groups

In short, substitution groups are too complex to be a practical mechanism. There are two main difficulties.

  • To write substitution groups correctly, you have to master the complex type, which is itself another beast that you should avoid.
  • It is hard to tell which elements are actually substitutable.

Simply put, a substitution group is another way to write a <choice>. So you can always use a <choice> instead of a substitution group; and <choice> is necessary anyway.

To use substitution groups properly, first you have to learn complex types, then several additional attributes, rules to use them, and finally the effect of using them. Even if you manage to get through this brave new world, your document authors still need to follow the same path all over again because otherwise they can't write documents properly. What a pity.

If you still think you want to use substitution groups, it's not as easy as you think.

First, the content model of substitution group members must be related to each other by type derivation. That means you cannot write content models freely. Soon you'll find yourself writing an abstract element as a substitution group head with a strange content model, just to maintain proper derivations between members. That's not right.

Second, attributes to control the substitution behavior are difficult to use and understand. There is an attribute called block, which is one of the attributes you use to control the substitution group. There is another attribute called final, which basically takes one of "extension", "restriction", or "#all" as its value.

final may look irrelevant to the substitution group, but the truth is that it's called "substitution group exclusions" internally and, as its name suggests, it controls the behavior of the substitution group. The internal name of the block attribute is "disallowed substitutions". Having trouble understanding the difference? Yeah, me too. Actually, both are used to control the substitution behavior, but in a different way.

The only way to prohibit the substitution of element Y by element Z is to add block="substitution" to Y. But even with the presence of this attribute, it is not an error to have Z in the substitution group of Y. It's just that you can never substitute Y with Z in your documents.

Even worse, if Y designates yet another element X as its substitution group head ( X <- Y <- Z ), then it's okay to substitute X with Z.

All these things make it impractical to use a substitution group in the real world, although it may look harmless when you are experimenting. And that's why you should avoid it.

Why You Should Avoid Chameleon Schemas

W3C XML Schema allows the schema element without the targetNamespace attribute. Some people call such schemas chameleons. Why they are called that is irrelevant; what you should know is to avoid them.

One reason is that it's highly likely that validators will have interoperability problems here. Another reason is that some people like to invent cool tricks by using a chameleon schema. But don't be fooled by those tricks; they are for schema hackers, not for ordinary good citizens.

Unfortunately, if you want to know exactly why you should avoid them, then you have to learn what they are. Consider the following chameleon schema.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- note that targetNamespace attribute is absent. -->
  
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="familyName" />
        <xs:element ref="firstName" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="familyName" type="xs:string"/>
  <xs:element name="firstName" type="xs:string"/>
</xs:schema>

Then you write another schema file and include the one above by using the include element.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.com">
  
  <xs:include schemaLocation="above.xsd" />
  
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="person" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

It seems OK, but actually it isn't. Look at the red line. It looks like a reference to the familyName element. But it isn't. Since this chameleon schema is included by a schema with targetNamespace="http://example.com/", the familyName element is in this namespace. So to refer to this declaration, you have to rewrite the red line as

<xs:element ref="bp:familyName" xmlns:bp="http://example.com" />

Now what happens if you want to reuse this chameleon schema from a schema whose target namespace is http://www.foo.com? The answer: you can't.

As you can see, the sole merit of using the chameleon schema is gone.

Even worse, you can't detect this error in some validators because they think that those missing components may appear afterward.

Conclusion

There are many pitfalls in XML Schema that should be avoided, which will make your life easier because you'll have less to learn. And you won't lose the expressiveness of W3C XML Schema. Keep it simple and have a happy life!



1 to 8 of 8
  1. I'm clueless
    2009-05-02 21:47:53 FlutterVertigo@gmail.com
  2. complex models
    2009-05-02 21:27:10 FlutterVertigo@gmail.com
  3. Need some experience to understand
    2004-09-13 06:53:50 ABR
  4. complex types
    2003-09-22 10:34:50 Karanjit Siyan
  5. Any non-trivial schema design...
    2002-04-23 16:31:49 Bruce Grant
  6. Ignorance is bliss!
    2001-08-31 14:35:15 Cathy Gregory
  7. Too simple, too lazy
    2001-06-12 15:13:22 Julio Andrade
  8. If your going to use something, why not use it right?
    2001-06-12 14:14:11 Robert May
1 to 8 of 8