Relax NG, Compared
January 23, 2002
Introduction
This article is a companion to two different works already published on XML.com: my introduction to W3C XML Schema is a tutorial introducing the language's main features, with a progression which I hope is intuitive; and my comparison between the main schema languages, an attempt to provide an objective and practical feature-by-feature comparison between XML schema languages. In this new article, I have taken the same approach as the one used in the W3C XML Schema tutorial but this time I've implemented the schemas using RELAX NG.
While the result is neither an optimal tutorial for RELAX NG -- since the progression designed for W3C XML Schema is not ideal to get started with RELAX NG -- nor an impartial comparison between these languages, I think it provides a good starting point for those of us who know W3C XML Schema and want to quickly point out the differences with RELAX NG. Links are provided throughout to the corresponding sections of the W3C XML Schema tutorial, and you are encouraged to follow both simultaneously.
Introducing our First Schema
Table of Contents |
•Introducing Our First
Schema |
[ Corresponding chapter for W3C XML Schema]
The document which we will be using for an example is the same that we saw in our W3C XML Schema tutorial:
<?xml version="1.0" encoding="UTF-8"?> <book isbn="0836217462"> <title> Being a Dog Is a Full-Time Job </title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>
We will follow the same design style that we used for our first W3C XML Schema to describe the document and will design it as a "Russian doll".
A RELAX NG schema is very close to a textual description of a vocabulary. To describe this document, we could say that we define a grammar starting with an element named book and this is pretty much what we will write as a RELAX NG schema.
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="book"> .../... </element> </start> </grammar>
To describe the element named book, we could say that it is composed of an attribute named isbn, an element named title, an element named author and zero or more elements named character:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="book"> <attribute name="isbn"> .../... </attribute> <element name="title"> .../... </element> <element name="author"> .../... </element> <zeroOrMore> <element name="character"> .../... </element> </zeroOrMore> </element> </start> </grammar>
RELAX NG has a clear separation between structure and datatypes, and we will see later on how we can plug a datatype system into our schema. For the moment, we will just consider that the values are not typed, i.e. that they are just text, and say so:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="book"> <attribute name="isbn"> <text/> </attribute> <element name="title"> <text/> </element> <element name="author"> <text/> </element> <zeroOrMore> <element name="character"> .../... </element> </zeroOrMore> </element> </start> </grammar>
The last thing that we need to do is to define the element named character. This can be done the same way by saying that it is composed of an element named name, an optional element named friend-of, an element named since, and an element named qualification.
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="book"> <attribute name="isbn"> <text/> </attribute> <element name="title"> <text/> </element> <element name="author"> <text/> </element> <zeroOrMore> <element name="character"> <element name="name"> <text/> </element> <optional> <element name="friend-of"> <text/> </element> </optional> <element name="since"> <text/> </element> <element name="qualification"> <text/> </element> </element> </zeroOrMore> </element> </start> </grammar>
Our first schema is now complete and we can use it to validate our instance document using, for instance, jing, the Java open source implementation of RELAX NG written by James Clark.
To make it more comparable to the W3C XML Schema, we need to see how a datatype system can be embedded. This is done by declaring which datatype system we will use and replacing the "text" elements by "data" elements. The editors of RELAX NG believe that there can be no universal datatype system and that, beyond some very basic universal types, each application domain has its own requirements. RELAX NG defines a generic mechanism for plugging in external type systems. The current implementations support W3C XML Schema datatypes. To use this datatype system in our schema, we will update it and write:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" > <start> <element name="book"> <attribute name="isbn"> <data type="nonNegativeInteger"/> </attribute> <element name="title"> <data type="token"/> </element> <element name="author"> <data type="token"/> </element> <zeroOrMore> <element name="character"> <element name="name"> <data type="token"/> </element> <optional> <element name="friend-of"> <data type="token"/> </element> </optional> <element name="since"> <data type="date"/> </element> <element name="qualification"> <data type="token"/> </element> </element> </zeroOrMore> </element> </start> </grammar>
Slicing the Schema
[ Corresponding chapter for W3C XML Schema]
Table of Contents |
•Introducing Our First
Schema |
With W3C XML Schema, we saw how we could slice the schema using references to global elements and attributes. The way to do it with RELAX NG is similar, yet different: RELAX NG does not have the notion of global or local elements but allows the definition and referencing of implicit containers that act like the W3C XML Schema element and attribute groups, while being more flexible.
Definitions are made using "define" elements and references with "ref" elements. These constructs are almost semantically neutral: you can include almost anything in a define and the reference will be replaced by stuff which you included in the "define". There is still just enough semantics to allow recursive constructions, but a "define" can be used indiscriminately to include an element, an attribute, a group of elements, a group of attributes, a group of elements and attributes, and so on.
If we want to mimic the slicing of our schema into elements and attributes, we can use this mechanism to create a definition isolating each element and attribute, such as:
<define name="isbn"> <attribute name="isbn"> <data type="nonNegativeInteger"/> </attribute> </define> <define name="title"> <element name="title"> <data type="token"/> </element> </define>
And we can use these definitions to construct elements, such as
<define name="character"> <element name="character"> <ref name="name"/> <optional> <ref name="friend-of"/> </optional> <ref name="since"/> <ref name="qualification"/> </element> </define> <element name="book"> <ref name="isbn"/> <ref name="title"/> <ref name="author"/> <zeroOrMore> <ref name="character"/> </zeroOrMore> </element>
It is important to note that although this construction can be used to imitate the behavior of W3C XML Schema global elements and attributes, RELAX NG's define and ref are not element and attribute definitions, but named patterns, making them much more flexible. In the snippet just shown above, I could have chosen for instance to include the cardinality of the character element within the book element by moving the "zeroOrMore" from the reference to the definition and written
<define name="characters"> <zeroOrMore> <element name="character"> <ref name="name"/> <optional> <ref name="friend-of"/> </optional> <ref name="since"/> <ref name="qualification"/> </element> </zeroOrMore> </define> <element name="book"> <ref name="isbn"/> <ref name="title"/> <ref name="author"/> <ref name="characters"/> </element>
The flat schema, comparable to our W3C XML Schema would be
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" > <define name="isbn"> <attribute name="isbn"> <data type="nonNegativeInteger"/> </attribute> </define> <define name="title"> <element name="title"> <data type="token"/> </element> </define> <define name="author"> <element name="author"> <data type="token"/> </element> </define> <define name="name"> <element name="name"> <data type="token"/> </element> </define> <define name="friend-of"> <element name="friend-of"> <data type="token"/> </element> </define> <define name="since"> <element name="since"> <data type="date"/> </element> </define> <define name="qualification"> <element name="qualification"> <data type="token"/> </element> </define> <define name="character"> <element name="character"> <ref name="name"/> <optional> <ref name="friend-of"/> </optional> <ref name="since"/> <ref name="qualification"/> </element> </define> <start> <element name="book"> <ref name="isbn"/> <ref name="title"/> <ref name="author"/> <zeroOrMore> <ref name="character"/> </zeroOrMore> </element> </start> </grammar>
Defining Named Types
[ Corresponding chapter for W3C XML Schema]
RELAX NG is all about patterns and has no notion of "simple types" or "complex types".
Here
again, we will need to use define and ref
to simulate these W3C XML
Schema features. The closest way to simulate a datatype is to define the pattern
corresponding to the content model of an element or attribute. This can be done for
simple
types and, depending on the datatype library and implementation you are using, you
may even
have access to the W3C XML Schema facets through type "parameters".
<define name="nameType"> <data type="token"> <param name="maxLength">32</param> </data> </define> <define name="isbnType"> <data type="nonNegativeInteger"> <param name="pattern">[0-9]{10}</param> </data> </define>
Pseudo-complex types can be defined in the same way.
<define name="bookType"> <attribute name="isbn"> <ref name="isbnType"/> </attribute> <element name="title"> <ref name="titleType"/> </element> <element name="author"> <ref name="authorType"/> </element> <zeroOrMore> <element name="character"> <ref name="characterType"/> </element> </zeroOrMore> </define>
The full schema using exclusively this "style" would be
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" > <define name="isbnType"> <data type="nonNegativeInteger"> <param name="pattern">[0-9]{10}</param> </data> </define> <define name="titleType"> <data type="token"/> </define> <define name="authorType"> <data type="token"/> </define> <define name="nameType"> <data type="token"> <param name="maxLength">32</param> </data> </define> <define name="friend-ofType"> <data type="token"/> </define> <define name="sinceType"> <data type="date"/> </define> <define name="qualificationType"> <data type="token"/> </define> <define name="characterType"> <element name="name"> <ref name="nameType"/> </element> <optional> <element name="friend-of"> <ref name="friend-ofType"/> </element> </optional> <element name="since"> <ref name="sinceType"/> </element> <element name="qualification"> <ref name="qualificationType"/> </element> </define> <define name="bookType"> <attribute name="isbn"> <ref name="isbnType"/> </attribute> <element name="title"> <ref name="titleType"/> </element> <element name="author"> <ref name="authorType"/> </element> <zeroOrMore> <element name="character"> <ref name="characterType"/> </element> </zeroOrMore> </define> <start> <element name="book"> <ref name="bookType"/> </element> </start> </grammar>
Groups, Compositors and Derivation
Table of Contents |
•Introducing Our First
Schema |
[ Corresponding chapter for W3C XML Schema]
Groups
Of course we can use RELAX NG's define and ref to simulate element and attribute groups and write
<define name="bookAttributes"> <attribute name="isbn"> <data type="nonNegativeInteger"/> </attribute> <optional> <attribute name="available"> <data type="token"/> </attribute> </optional> </define> <define name="mainBookElements"> <element name="title"> <data type="token"/> </element> <element name="author"> <data type="token"/> </element> </define> .../... <element name="book"> <ref name="bookAttributes"/> <ref name="mainBookElements"/> <zeroOrMore> <ref name="character"/> </zeroOrMore> </element>
Here again we have more flexibility since our pattern definition may mix both element and attribute definitions.
The W3C XML Schema element and attribute groups are the W3C XML Schema features which
are
closest to RELAX NG patterns define and ref
: the advice given by Kohsuke Kawaguchi in his article W3C XML Schema Made
Simple to avoid using W3C XML Schema complex types to rely on element and attribute
groups may be seen as a way to follow a RELAX NG style in W3C XML Schema.
Compositors
As we've seen in all our examples up to now, a "sequence" compositor is implicit and the default behavior for RELAX NG.
Choices are defined using a choice element, and group elements can be used as containers:
<define name="nameTypes"> <choice> <element name="name"> <data type="token"/> </element> <group> <element name="firstName"> <data type="token"/> </element> <optional> <element name="middleName"> <data type="token"/> </element> </optional> <element name="lastName"> <data type="token"/> </element> </group> </choice> </define>
The choice element can also be used for simple types, giving a coherent way to define enumerations, independent of the datatype library in use.
<define name="availableType"> <choice> <value>stock</value> <value>order</value> <value>na</value> </choice> </define>
RELAX NG is free of most of the limitations of W3C XML Schema concerning non-determinism, and choice can be used to differentiate elements having the same name and different content models. For example, consider the following, which would allow a name to be either plain text or have three subelements.
<element name="name"> <choice> <text/> <group> <element name="firstName"> <data type="token"/> </element> <optional> <element name="middleName"> <data type="token"/> </element> </optional> <element name="lastName"> <data type="token"/> </element> </group> </choice> </name>
Elements and attributes are managed, as far as possible, in a consistent way and choice can also be used to allow a value to be placed either in an attribute or in an element (this is something RDF permits, for example).
<element name="book" <choice> <attribute name="isbn"> <value type="integer"/> </attribute> <element name="isbn"> <value type="integer"/> </element> </choice> .../... </element>
The equivalent of W3C XML Schema xs:all element is the interleave element, which defines unordered lists of subelements. The interleave element has none of the xs:all limitations: we could use it in all our complex content definitions to define a schema where the order of the subelements would not be tested at all, which would not be possible with W3C XML Schema because of the "zeroOrMore" number of character elements included in the book element.
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0"> <start> <element name="book"> <interleave> <attribute name="isbn"> <text/> </attribute> <element name="title"> <text/> </element> <element name="author"> <text/> </element> <zeroOrMore> <element name="character"> <interleave> <element name="name"> <text/> </element> <optional> <element name="friend-of"> <text/> </element> </optional> <element name="since"> <text/> </element> <element name="qualification"> <text/> </element> </interleave> </element> </zeroOrMore> </interleave> </element> </start> </grammar>
Derivation of simple types
The notion of simple types is imported from datatype libraries, and although predefined datatypes may be used to derive new types -- by restriction of predefined types and usage of parameters -- the result of such a derivation is considered a RELAX NG pattern, losing its semantic of simple type and cannot be derived by restriction any longer. Derivation by union is done using a choice element, and the equivalent of our W3C XML Schema example would be
<define name="isbnType"> <choice> <data type="nonNegativeInteger"> <param name="pattern">[0-9]{10}</param> </data> <value>TBD</value> <value>NA</value> </choice> </define>
Derivation by list can be done using the list element, which applies to simple types only. Unlike its W3C XML Schema counterpart, the RELAX NG list operator lets us combine several datatypes within a list. For example, to define a length with an optional unit, we could write
<define name="lengthType"> <list> <data type="integer"/> <optional> <choice> <value>pixel</value> <value>in</value> <value>cm</value> </choice> </optional> </list> </define>
On the other hand, RELAX NG doesn't provide the granularity available in W3C XML Schema to define cardinalities. So, to define a list of up to ten ISBN codes, we could take the loose route, and define a list of "zeroOrMore" isbnType:
<define name="isbnTypes"> <list> <zeroOrMore> <ref name="isbnType"/> </zeroOrMore> </list> </define>
Or we could explicitly (and verbosely) show the limit:
<define name="isbn10Types"> <list> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> <optional> <ref name="isbnType"/> </optional> </optional> </optional> </optional> </optional> </optional> </optional> </optional> </optional> </optional> </list> </define>
Content Types
Table of Contents |
•Introducing Our First
Schema |
[ Corresponding chapter for W3C XML Schema]
All the content types are handled through patterns, and there is no difference between the ways one constructs simple, complex, mixed, or empty content models with RELAX NG.
To define empty content models, you just leave the content empty (i.e. omit any embedded text, value or element declaration):
<element name="book"> <attribute name="isbn"> <ref name="isbnType"/> </attribute> </element>
To define a simple content model, you just omit the declaration of any embedded element:
<element name="book"> <attribute name="isbn"> <ref name="isbnType"/> </attribute> <text/> </element>
To define a complex content model (not mixed), you just omit the declaration of any embedded text:
<element name="book"> <attribute name="isbn"> <text/> </attribute> <element name="title"> <text/> </element> <element name="author"> <text/> </element> </element>
To define mixed content models you declare both embedded text and elements:
<element name="book"> <attribute name="isbn"> <text/> </attribute> <interleave> <element name="title"> <text/> </element> <element name="author"> <text/> </element> <zeroOrMore> <text/> </zeroOrMore> </interleave> </element>
Since text nodes are handled like elements and attributes, their individual location and type can be defined and constrained, something which isn't possible with W3C XML Schema. Suppose we have an element p containing lines terminated by empty br elements, and that we want to disallow blank lines. We can write
<element name="p"> <zeroOrMore> <text/> <element name="br"> </zeroOrMore> <optional> <text/> </optional> </element>
Constraints
[ Corresponding chapter for W3C XML Schema]
This chapter will be a short one. RELAX NG does not provide any feature to define keys and key references. A workaround can be to embed Schematron rules enforcing these constraints into your RELAX NG schema as supported by some implementations.
Building Usable -- and Reusable -- Schemas
[ Corresponding chapter for W3C XML Schema]
RELAX NG doesn't include anything similar to the xs:annotation W3C XML Schema element. Instead it provides much less or much more, depending on how you want to consider it. RELAX NG processors simply ignore any element or attribute with a namespace URI different from the RELAX NG namespace URI. So we can just help ourselves and use any element from any namespace we like to annotate RELAX NG Schemas. An example (similar to what we've seen with W3C XML Schema) could be
<element name="book"> <annotation xmlns="http://example.com/doc"> <documentation xml:lang="en"> Top level element. </documentation> <documentation xml:lang="fr"> Element racine. </documentation> <appinfo source="http://example.com/foo/"> <bind xmlns="http://example.com/bar/"> <class name="Book"/> </bind> </appinfo> </annotation> .../... </element>
Composing schemas from multiple files
The RELAX NG include element provides the same kind of functionality as W3C XML Schema's xs:include and xs:redefine: you can use it without any child elements to perform a straight inclusion of the patterns defined in another RELAX NG schema or embed in the include element any pattern redefinition. The restriction of W3C XML Schema that any redefinition has to be a valid derivation of the base type doesn't exist in RELAX NG: the redefinition may be as different from the base pattern as you want it to be. Our W3C XML Schema would then become in RELAX NG:
<include href="foo.rng"> <define name="nameType"> <data type="token"> <param name="maxLength">40</param> </data> </define> </include>
Substitution groups and abstract elements
There are no substitution groups in RELAX NG. However, it provides a way to define
which
rule may be applied when a pattern is redefined, which can play a similar role through
the
combine
attribute. This attribute, belonging to the define element,
may take the values "choice" or "interleave". When its value is "choice", a reference
to the
pattern may match either definition. For example, a reference to the following definition
will accept either a name or a subname element.
<define name="name" combine="choice"> <element name="name"> <text/> </element> </define> <define name="name" combine="choice"> <element name="surname"> <text/> </element> </define>
Since the names of the definitions and references are naming RELAX NG patterns, and not elements or attributes, there is no need here to define abstract elements.
Final types
The notion of final type is closely bound to the W3C XML Schema derivation features and does not apply to RELAX NG, which is focused on defining reusable patterns.
Namespaces
[ Corresponding chapter for W3C XML Schema]
Table of Contents |
•Introducing Our First
Schema |
There is no necessity to define one and only one namespace per schema with RELAX NG. Several namespaces may be defined in a single schema, and, conversely, a namespace may be described by several schemas. The namespace of the elements and attributes being described can be indicated using either an ns attribute or by using qualified names.
The ns attribute is inherited by all the child elements of the element where it is defined. The simplest way to change our first schema to place it in the http://example.org/book namespace would just be to add the ns attribute in the document element:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" ns="http://example.org/book"> <start> <element name="book"> <attribute name="isbn"> <text/> </attribute> <element name="title"> <text/> </element> <element name="author"> <text/> </element> <zeroOrMore> <element name="character"> <element name="name"> <text/> </element> <optional> <element name="friend-of"> <text/> </element> </optional> <element name="since"> <text/> </element> <element name="qualification"> <text/> </element> </element> </zeroOrMore> </element> </start> </grammar>
Note that the rules to default the value of the ns attribute follow the same rules used to default the default namespace in XML: the default namespace doesn't apply to attributes, and this schema expects that the isbn attribute will be unqualified. If we want to specify that it will be qualified, we need to add the ns attribute in the attribute definition and write:
<attribute name="isbn" ns="http://example.org/book"> <text/> </attribute>
Importing definitions from external namespaces
Qualified names (qnames) can also be used for elements and attributes; since no namespace is attached to a RELAX NG schema, patterns cannot be referenced or defined using qualified names like W3C XML Schema types or groups. When using qnames, RELAX NG applies the same rule as XPath 1.0 and doesn't use the default namespace (defined in the RELAX NG schema considered as a XML document). The usage of qnames is aimed at making RELAX NG schemas less verbose when they mix several namespaces. For instance, to accept a xml:lang element in our title, we could write either
<element name="title"> <text/> <attribute name="lang" ns="http://www.w3.org/XML/1998/namespace"> <value type="language"/> </attribute> </element>
or
<element name="title" xmlns:xml="http://www.w3.org/XML/1998/namespace"> <text/> <attribute name="xml:lang"> <value type="language"/> </attribute> </element>
We could import the definition of the xml:lang attribute from another schema using the include element as we've seen earlier, but this is not mandatory.
Including unknown elements
As with W3C XML Schema, the inclusion of unknown elements is done through wildcards, but unlike W3C XML Schema, these wildcards are wildcards on the element and attribute names and not on the elements and attributes themselves. These wildcards (called "name classes") accept an except element, something missing from W3C XML Schema. The following shows the equivalent of the W3C complex type definition that allows any XHTML element within a mixed content element (the wildcards used here are "nsName", which means any name from a namespace and "anyName", which means any name from any namespace).
<define name="descType"> <zeroOrMore> <choice> <text/> <element> <nsName ns="http://www.w3.org/1999/xhtml"/> <ref name="anyThing"/> </element> </choice> </zeroOrMore> </define> <define name="anyThing"> <zeroOrMore> <choice> <text/> <attribute> <anyName/> </attribute> <element> <anyName/> <ref name="anyThing"/> </element> </choice> </zeroOrMore> </define>
Note that RELAX NG doesn't include any predefined processing model (the equivalent of W3C XML Schema's "lax", "strict" or "skip") but since the wildcards are wildcards on the names only, you can and must define their content model and can choose to define what you like. Here we have defined a pattern named "anyThing", which we use as content model for the elements we accept from the XHTML namespace, which means that in W3C XML Schema terms we have a "skip" processing model.
Schema and Instance Documents
[ Same chapter for W3C XML Schema]
This section doesn't apply to RELAX NG which doesn't interfere with instance documents.
Concluding Comments
Throughout this comparison, we have seen that one of the main differences between the two languages is a matter of style: while RELAX NG focuses on generic "patterns", W3C XML Schema has differentiated these patterns into a set of distinct components (elements, attributes, groups, complex and simple types). The result is on one side a language which is lightweight and flexible (RELAX NG) and on the other side a language which gives more "meaning" or "semantic" to the components that it manipulates (W3C XML Schema). The question of whether the added features are worth the price in terms of complexity and rigidity is open, and the answer probably depends on the applications.
Independently of this first difference between the two, the different positions regarding "non-determinism" between RELAX NG, which accepts most of the constructs a designer can imagine, and W3C XML Schema, which is very strict, mean that a number of vocabularies which can be described by RELAX NG cannot be described by W3C XML Schema.
A way to summarize this is to notice that an implementation such as MSV (the "Multi Schema Validator" developed by Kohsuke Kawaguchi for Sun Microsystems) uses a RELAX NG internal representation as a basis to represent the grammar described in W3C XML Schema and DTD schemas. This seems to indicate that RELAX NG can be used as the base on which object oriented features such as those of W3C XML Schema can be implemented. The value of an XML-specific object-oriented layer is still to be determined, though, since generic object-oriented tools should be able to generate RELAX NG schemas directly.