Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

W3C XML Schema Made Simple
by Kohsuke Kawaguchi | Pages: 1, 2

Why You Should Avoid Notation Declarations

If you haven't heard about notations, please be assured that you aren't missing anything. Notations exist only for backward compatibility. There is no need to learn about them.

If you do know notations, you should know that notations in W3C XML Schema are not compatible with notations in DTDs, because a Schema notation is a QName.

The following example is from the spec.

<xs:notation name="jpeg"
             public="image/jpeg" system="viewer.exe" />

<xs:element name="picture">
 <xs:complexType>
  <xs:simpleContent>
   <xs:extension base="xs:hexBinary">
    <xs:attribute name="pictype">
     <xs:simpleType>
      <xs:restriction base="xs:NOTATION">
       <xs:enumeration value="jpeg"/>
       <xs:enumeration value="png"/>
        ...
      </xs:restriction>
     </xs:simpleType>
    </xs:attribute>
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>
</xs:element>

<picture pictype="jpeg"> ... </picture>

This example is okay. But the following fragment is not valid even if the prefix "pic" is properly declared.

<pic:picture pictype="jpeg"> ... </pic:picture>

Confused? You have to write it as follows because it's a QName.

<pic:picture pictype="pic:jpeg"> ... </pic:picture>

Apparently it fails to serve its only reason for existing. There's really no reason to use notations. Notations are for SGML.

Why You Should Avoid Local Declarations

W3C XML Schema allows you to declare elements inside another element:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="familyName" type="xs:string" />
        <xs:element name="firstName" type="xs:string" />
      <xs:sequence>
    <xs:complexType>
  <xs:element>
<xs:schema>

But generally you should avoid this if possible, because the above schema does not match the following instance:

<person xmlns="http://example.com">
  <familyName> KAWAGUCHI <familyName>
  <firstName> Kohsuke <firstName>
<person>

Rather, you have to write it as

<foo:person xmlns:foo="http://example.com">
  <familyName> KAWAGUCHI <familyName>
  <firstName> Kohsuke <firstName>
<foo:person>

Not only does this require more typing, it is also a bad use of XML Namespaces. To avoid this problem, you should write

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://example.com">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="familyName" />
        <xs:element ref="firstName" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="familyName" type="xs:string" />
  <xs:element name="firstName" type="xs:string" />
</xs:schema>

Another way to solve the problem is to add elementFormDefault="qualified" to the schema element. You can then safely use local element declarations. It probably isn't worth the effort to understand exactly what this means. Just understand that it makes the schema behave in the "right" way.

Why You Should Avoid Substitution Groups

In short, substitution groups are too complex to be a practical mechanism. There are two main difficulties.

  • To write substitution groups correctly, you have to master the complex type, which is itself another beast that you should avoid.
  • It is hard to tell which elements are actually substitutable.

Simply put, a substitution group is another way to write a <choice>. So you can always use a <choice> instead of a substitution group; and <choice> is necessary anyway.

To use substitution groups properly, first you have to learn complex types, then several additional attributes, rules to use them, and finally the effect of using them. Even if you manage to get through this brave new world, your document authors still need to follow the same path all over again because otherwise they can't write documents properly. What a pity.

If you still think you want to use substitution groups, it's not as easy as you think.

First, the content model of substitution group members must be related to each other by type derivation. That means you cannot write content models freely. Soon you'll find yourself writing an abstract element as a substitution group head with a strange content model, just to maintain proper derivations between members. That's not right.

Second, attributes to control the substitution behavior are difficult to use and understand. There is an attribute called block, which is one of the attributes you use to control the substitution group. There is another attribute called final, which basically takes one of "extension", "restriction", or "#all" as its value.

final may look irrelevant to the substitution group, but the truth is that it's called "substitution group exclusions" internally and, as its name suggests, it controls the behavior of the substitution group. The internal name of the block attribute is "disallowed substitutions". Having trouble understanding the difference? Yeah, me too. Actually, both are used to control the substitution behavior, but in a different way.

The only way to prohibit the substitution of element Y by element Z is to add block="substitution" to Y. But even with the presence of this attribute, it is not an error to have Z in the substitution group of Y. It's just that you can never substitute Y with Z in your documents.

Even worse, if Y designates yet another element X as its substitution group head ( X <- Y <- Z ), then it's okay to substitute X with Z.

All these things make it impractical to use a substitution group in the real world, although it may look harmless when you are experimenting. And that's why you should avoid it.

Why You Should Avoid Chameleon Schemas

W3C XML Schema allows the schema element without the targetNamespace attribute. Some people call such schemas chameleons. Why they are called that is irrelevant; what you should know is to avoid them.

One reason is that it's highly likely that validators will have interoperability problems here. Another reason is that some people like to invent cool tricks by using a chameleon schema. But don't be fooled by those tricks; they are for schema hackers, not for ordinary good citizens.

Unfortunately, if you want to know exactly why you should avoid them, then you have to learn what they are. Consider the following chameleon schema.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <!-- note that targetNamespace attribute is absent. -->
  
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="familyName" />
        <xs:element ref="firstName" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="familyName" type="xs:string"/>
  <xs:element name="firstName" type="xs:string"/>
</xs:schema>

Then you write another schema file and include the one above by using the include element.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.com">
  
  <xs:include schemaLocation="above.xsd" />
  
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="person" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

It seems OK, but actually it isn't. Look at the red line. It looks like a reference to the familyName element. But it isn't. Since this chameleon schema is included by a schema with targetNamespace="http://example.com/", the familyName element is in this namespace. So to refer to this declaration, you have to rewrite the red line as

<xs:element ref="bp:familyName" xmlns:bp="http://example.com" />

Now what happens if you want to reuse this chameleon schema from a schema whose target namespace is http://www.foo.com? The answer: you can't.

As you can see, the sole merit of using the chameleon schema is gone.

Even worse, you can't detect this error in some validators because they think that those missing components may appear afterward.

Conclusion

There are many pitfalls in XML Schema that should be avoided, which will make your life easier because you'll have less to learn. And you won't lose the expressiveness of W3C XML Schema. Keep it simple and have a happy life!


Comment on this articleHas the author gone too far in throwing out features from XML Schema, or not far enough? Have your say on this important XML technology in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • I'm clueless
    2009-05-02 21:47:53 FlutterVertigo@gmail.com [Reply]

    I understand the end part; i.e., what the structure is like. But what is it derived from? How do I know how similar it is to my goal?


    It's like a problem on a math test: you have to show your work instead of just putting the answer down. If shorter steps is a turn-off to many people, then provide the material necessary step-by-step is beneath other readers, stick a little control in there to expand or shrink as desired.

  • complex models
    2009-05-02 21:27:10 FlutterVertigo@gmail.com [Reply]

    It's difficult to understand the XML structure when we see the goal, but not the original material. Seeing the process continue, step-by-step, like a math problem on a test, where you have to show your work vs. just writing down the answer.


  • Need some experience to understand
    2004-09-13 06:53:50 ABR [Reply]

    This is a great article, even though the advice it provides will not work well in every situation. Unfortunately, as the comments here indicate, it's difficult to appreciate unless you've gotten your hands dirty. Many of the complicated features of XML Schema LOOK manageable in the abstract, or when dealing with toy examples, but when you actually try designing a real modular, extensible system of multiple schemas, you begin to see what Mr. Kawaguchi is talking about.


    XML schema has tried to provide everything that both markup languages and object-oriented systems can offer, and whether it is roughness of first attempt or the inherent complexity of this task, it is widely held to be confusing and difficult to use. The poor state of implementations several years after Recommendation status speaks to this aptly. It is well worth taking a look at an alternative, Relax-NG, which strives to maintain an elegant, spare framework will providing the same expressive power. It has some of its own problems, but generally does better than schema in the usability department.


  • complex types
    2003-09-22 10:34:50 Karanjit Siyan [Reply]

    Global complexTypes defiens as named types are very useful for reusing existing types. The derivation by extension is also very useful for deriving an object oriented content system which could be mapped using data binding to language classes in Java, etc.


    The article makes some good points about the derivation by restriction, but oversimplifies and therefore misstates things.


    -- Karanjit Siyan

  • Any non-trivial schema design...
    2002-04-23 16:31:49 Bruce Grant [Reply]

    While I agree that it does chafe to hear someone say, "you do not need to know..." the fact is that he's right.


    Any non-trivial schema design, in my case several hundred unique sub-systems, each represented as an individual schema to create a view of the entire system, will prove that his commentary hits close to home. Perhaps it would be more accurate to say that current deficiencies in specific aspects of Schema make these aspects worth ignoring until they are improved. But then, since when have developers cared about being PC.


    With some simple changes to restriction and extension, I would once again use complexType widely. Until they do, forget it. I have real work to do.

  • Ignorance is bliss!
    2001-08-31 14:35:15 Cathy Gregory [Reply]

    Yes, XML Schema can be easy to learn and use, but the author doesn't seem to have bothered to do so.


    He says to not use complex types then gives an example of one, but brushes off the <complexType> as just a placeholder. - Not using complex types would mean that your XML has one element containing text (no child elements or attributes and that's it!


    Throughout the article, his justifications for not using global/local declarations are false and much stems from his lack of understanding (elementFormDefault and attributeFormDefault). The same goes for most of his 'Do-Not-Uses'.


    Poor examples were used to justify poor understanding with the conclusion being 'do not use'.


    The following article is far better - thanks Donald.
    http://www.xml.com/pub/a/2001/08/22/easyschema.html


    Cat G.

  • Too simple, too lazy
    2001-06-12 15:13:22 Julio Andrade [Reply]

    Mr Kohsuke,


    the argument "you don't want to spend the time learning" seems to be a great one for you. I hope people delve into the capabilities of XML Schema.


    if you are doing something real with XML Schemas you WILL have to learn how to use it properly. The possiblity of creating a whole type system, based on named complexTypes, is huge! The implications of not understanding the use of qualified locals, importing namespaces and other aspects doomed to be too difficult, will come back and bite you hard...


    Not a very good job Mr editor. As a reader I want articles that widen my understanding of XML...


    Julio

  • If your going to use something, why not use it right?
    2001-06-12 14:14:11 Robert May [Reply]

    Yes, you can use schemas without understanding all of the details, but most of the high level things (which he throws out) aren't really that difficult to understand.


    This article is like telling people not to worry about the gear shift--if they rev the engine enough, they can start in 4th gear and they won't have to shift.


    It's your job to understand the technology your using--you should do an excellent job, not just an adequate job.