XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Comparing XML Schema Languages
by Eric van der Vlist | Pages: 1, 2, 3

Our Sample Application

In the remainder of this article, I will be using the following simple library application to illustrate the use of the various schema languages.

<?xml version="1.0" encoding="utf-8"?>
<library>
  <book id="_0836217462">
    <isbn>0836217462</isbn>
    <title>Being a Dog Is a Full-Time Job</title>
    <author-ref id="Charles-M.-Schulz"/>
    <character-ref id="Peppermint-Patty"/>
    <character-ref id="Snoopy"/>
    <character-ref id="Schroeder"/>
    <character-ref id="Lucy"/>
  </book>
  <book id="_0805033106">
    <isbn>0805033106</isbn>
    <title>Peanuts Every Sunday </title>
    <author-ref id="Charles-M.-Schulz"/>
    <character-ref id="Sally-Brown"/>
    <character-ref id="Snoopy"/>
    <character-ref id="Linus"/>
    <character-ref id="Snoopy"/>
  </book>
  <author id="Charles-M.-Schulz">
    <name>Charles M. Schulz</name>
    <nickName>SPARKY</nickName>
    <born>November 26, 1922</born>
    <dead>February 12, 2000</dead>
  </author>
  <character id="Peppermint-Patty">
    <name>Peppermint Patty</name>
    <since>Aug. 22, 1966</since>
    <qualification>bold, brash and tomboyish</qualification>
  </character>
  <character id="Snoopy">
    <name>Snoopy</name>
    <since>October 4, 1950</since>
    <qualification>extroverted beagle</qualification>
  </character>
  <character id="Schroeder">
    <name>Schroeder</name>
    <since>May 30, 1951</since>
    <qualification>
      brought classical music to the Peanuts strip
    </qualification>
  </character>
  <character id="Lucy">
    <name>Lucy</name>
    <since>March 3, 1952</since>
    <qualification>bossy, crabby and selfish</qualification>
  </character>
  <character id="Sally-Brown">
    <name>Sally Brown</name>
    <since>Aug, 22, 1960</since>
    <qualification>always looks for the easy way out</qualification>
  </character>
  <character id="Linus">
    <name>Linus</name>
    <since>Sept. 19, 1952</since>
    <qualification>the intellectual of the gang</qualification>
  </character>
</library>

DTDs

Overview

Author: W3C
Status: Recommendation ("embedded" in XML 1.0)
Location: http://www.w3.org/TR/REC-xml
PSVI: Yes
Structures: Yes
Datatypes: Yes (weak)
Integrity: Yes (internal through ID/IDREF/IDREFS attributes)
Rules: No
Vendor support: Excellent
Miscellaneous: Non-XML syntax; no support for namespaces. schema definition is only one of the features of DTDs.

Inherited from SGML, the XML DTD is the most widely deployed means of defining an XML schema. Defined in the XML 1.0 Recommendation, DTD does not support namespaces, which were specified later. This, together with the fact that its datatype system is weak and only applies to attributes, is one of the main motivations for the W3C to develop a new schema language.

The DTD for our sample could be

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT author (name, nickName, born, dead)>
<!ATTLIST author
  id ID #REQUIRED
>
<!ELEMENT author-ref EMPTY>
<!ATTLIST author-ref
  id IDREF #REQUIRED
>
<!ELEMENT book (isbn, title, author-ref*, character-ref*)>
<!ATTLIST book
  id ID #REQUIRED
>
<!ELEMENT born (#PCDATA)>
<!ELEMENT character (name, since, qualification)>
<!ATTLIST character
  id ID #REQUIRED
>
<!ELEMENT character-ref EMPTY>
<!ATTLIST character-ref
  id IDREF #REQUIRED
>
<!ELEMENT dead (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT library (book+, author*, character*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT nickName (#PCDATA)>
<!ELEMENT qualification (#PCDATA)>
<!ELEMENT since (#PCDATA)>
<!ELEMENT title (#PCDATA)>

W3C XML Schema Definition Language

Overview

Author:

W3C

Status:

Recommendation

Location:

http://www.w3.org/TR/xmlschema-0/

PSVI:

Yes

Structures:

Yes

Datatypes:

Yes

Integrity:

Yes (internal through ID/IDREF/IDREFS and xs:unique/xs:key/xs:keyref)

Rules:

No

Vendor support:

Potentially excellent but currently still immature.

Miscellaneous:

Borrows many ideas from OOP design; considered complex; paranoid about determinism; part of the foundation of XML in the vision of the W3C.



W3C XML Schema was published by the W3C to provide an alternative to XML DTD that supported namespaces, facilitates the design of open and extensible vocabularies, and meets the requirement of data-oriented applications for a richer datatyping system. It does so by borrowing many features from OOP languages, and hence the fit with the tree structure of XML documents is sometimes difficult to make. It is generally considered complex, partly because of the number of features, and partly because of the style of the recommendation which describes the validation process more than the modeling features.

W3C XML Schema is a strongly typed schema language that eliminates any non-deterministic design from the described markup to insure that there is no ambiguity in the determination of the datatypes and that the validation can be made by a finite state machine.

A W3C XML Schema schema for our sample could be

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="book" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="isbn" type="xs:string"/>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="author-ref" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="id" type="xs:IDREF" use="required"/>
                </xs:complexType>
              </xs:element>
              <xs:element name="character-ref" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="id" type="xs:IDREF" use="required"/>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID" use="required"/>
          </xs:complexType>
        </xs:element>
        <xs:element name="author" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element ref="name"/>
              <xs:element name="nickName" type="xs:string"/>
              <xs:element name="born" type="xs:string"/>
              <xs:element name="dead" type="xs:string"/>
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID" use="required"/>
          </xs:complexType>
        </xs:element>
        <xs:element name="character" minOccurs="0" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element ref="name"/>
              <xs:element name="since" type="xs:string"/>
              <xs:element name="qualification" type="xs:string"/>
            </xs:sequence>
            <xs:attribute name="id" type="xs:ID" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="name" type="xs:string"/>
</xs:schema>

RELAX NG

Overview

Author:

OASIS and possibly ISO

Status:

OASIS RELAX NG Comittee Specification

Location:

http://relaxng.org/

PSVI:

No

Structures:

Yes

Datatypes:

No, but a modular mechanism has been defined to plug in datatype systems (W3C XML Schema part 2 and others if needed).

Integrity:

No (except through ID/IDREF/IDREFS features of a datatype system)

Rules:

No

Vendor support:

To be seen.

Miscellaneous:

Result of the merge between RELAX and TREX, might become an ISO TR. Strong mathematical grounding. Alternate non-XML syntax proposed by James Clark.



Its editors (James Clark and Murata Makoto) define RELAX NG as "the next generation schema language for XML: clean, simple and powerful". RELAX NG appears to be closer to a description of the instance documents in ordinary English and simpler than W3C XML Schema, to which it might become a serious alternative.

Many constraints, especially those which are on the fringe of non-deterministic models, can be expressed by RELAX NG and not by W3C XML Schema. Some combinations in document structures forbidden by W3C XML Schema can be described by RELAX NG.

Even though RELAX NG seems to be technically superior to W3C XML Schema, support by software vendors and XML developers is uncertain now that W3C XML Schema is a Recommendation.

A RELAX NG schema for our sample could be

<?xml version="1.0" encoding="UTF-8"?>
<grammar 
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" 
   xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <choice>
      <ref name="library"/>
    </choice>
  </start>
  <define name="library">
    <element name="library">
      <oneOrMore>
        <ref name="book"/>
      </oneOrMore>
      <zeroOrMore>
        <ref name="author"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="character"/>
      </zeroOrMore>
    </element>
  </define>
  <define name="author">
    <element name="author">
      <attribute name="id">
        <data type="ID"/>
      </attribute>
      <element name="name">
        <text/>
      </element>
      <element name="nickName">
        <text/>
      </element>
      <element name="born">
        <text/>
      </element>
      <element name="dead">
        <text/>
      </element>
    </element>
  </define>
  <define name="book">
    <element name="book">
      <ref name="id-attribute"/>
      <ref name="isbn"/>
      <ref name="title"/>
      <zeroOrMore>
        <element name="author-ref">
          <attribute name="id">
            <data type="IDREF"/>
          </attribute>
          <empty/>
        </element>
      </zeroOrMore>
      <zeroOrMore>
        <element name="character-ref">
          <attribute name="id">
            <data type="IDREF"/>
          </attribute>
          <empty/>
        </element>
      </zeroOrMore>
    </element>
  </define>
  <define name="id-attribute" >
    <attribute name="id">
      <data type="ID"/>
    </attribute>
  </define>
  <define name="character">
    <element name="character">
      <ref name="id-attribute"/>
      <ref name="name"/>
      <ref name="since"/>
      <ref name="qualification"/>
    </element>
  </define>
  <define name="isbn">
    <element name="isbn">
      <text/>
    </element>
  </define>
  <define name="name">
    <element name="name">
      <text/>
    </element>
  </define>
  <define name="nickName">
    <element name="nickName">
      <text/>
    </element>
  </define>
  <define name="qualification">
    <element name="qualification">
      <text/>
    </element>
  </define>
  <define name="since">
    <element name="since">
      <data type="date"/>
    </element>
  </define>
  <define name="title">
    <element name="title">
      <text/>
    </element>
  </define>
</grammar>

Schematron

Overview

Author: Rick Jelliffe and other contributors.
Status: Unofficial
Location: http://www.ascc.net/xml/schematron/
PSVI: No (not directly)
Structures: No (not directly)
Datatypes: No (not directly)
Integrity: No (not directly)
Rules: Yes, through XPath expressions
Vendor support: Low
Miscellaneous: Pure rule expression.

Schematron is an XPath/XSLT-based language for defining context dependent rules. Schematron doesn't directly support structure or datatype validation, but a schema author may write rules which implement these structure and datatype checks. To write a full schema with Schematron, the author needs to take care to include all the rules needed to qualify the structure of the document.

A partial Schematron schema for our sample could be

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
  <sch:title>Schematron Schema for library</sch:title>
  <sch:pattern>
    <sch:rule context="/">
      <sch:assert test="library">
        The document element should be "library".
        </sch:assert>
    </sch:rule>
    <sch:rule context="/library">
      <sch:assert test="book">
        There should be at least a book!
        </sch:assert>
      <sch:assert test="not(@*)">
        No attribute for library, please!
        </sch:assert>
    </sch:rule>
    <sch:rule context="/library/book">
      <sch:assert test="not(following-sibling::book/@id=@id)">
        Duplicated ID for this book.
        </sch:assert>
      <sch:assert test="@id=concat('_', isbn)">
        The id should be derived from the ISBN.
        </sch:assert>
    </sch:rule>
    <sch:rule context="/library/*">
      <sch:assert test="name()='book' or name()='author' or name()='character'">
        This element shouldn't be here...
        </sch:assert>
    </sch:rule>
  </sch:pattern>
</sch:schema>

Pages: 1, 2, 3

Next Pagearrow