RELAX NG's Compact Syntax
Working with XML Schema is like driving a limousine. It's true that it has some nice appointments (datatypes come to mind), but the wheelbase is a bit on the long side, making it difficult to turn corners easily, and I am inclined to let somebody else do the driving for me. Using RELAX NG, on the other hand, is like driving a sports car. It holds corners amazingly well, and I am much less interested in handing over the keys to anyone. You may prefer to drive a limo over a sports car. But I'll take the sports car any day.
You are probably familiar with XML Schema and RELAX NG. Both are schema languages for XML. The former was released by the W3C in May 2001, while the latter was released in December 2001 by OASIS. RELAX NG, which was developed by a small technical committee lead by James Clark, merges Murata Makoto's RELAX and Clark's TREX. It is a simple, yet elegant evolution of the DTD, which is also easy to learn. It is modular in design. The main core of RELAX NG is focused on validation alone and doesn't modify the infoset in the process of validation; in other words, no PSVI. RELAX NG is also part of an ISO draft standard, ISO/IEC DIS 19757-2.
RELAX NG schemas were originally written in XML, but there's also a compact, non-XML syntax. While this article doesn't contain an exhaustive review of all the features of RELAX NG, it will give you a good idea of how to use the main parts of the compact syntax. If you don't know much about RELAX NG, I suggest that you read Eric van der Vlist's RELAX NG Compared before finishing this article.
|
Related Reading
XML Schema |
I think you'll find the compact syntax quite readable and easy to learn. In some respects, a RELAX NG schema in compact form looks like a context-free grammar, which provides a familiar view of the language, is readily comprehensible, and amenable to parsing. Also, don't be surprised if the compact syntax bears a resemblance to the syntax of XDuce and XQuery's computed element and attribute constructors.
The first exhibit is a well-formed XML document which represents an ISO 8601 date:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE date SYSTEM "date.dtd">
<date type="ISO8601">
<year>2002</year>
<month>06</month>
<day>01</day>
</date>
This document is an instance of the following DTD:
<!ELEMENT date (year, month, day)>
<!ATTLIST date type CDATA #IMPLIED>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
It is also valid with regard to the following RELAX NG schema in XML syntax:
<?xml version="1.0" encoding="UTF-8"?>
<element name="date"
xmlns="http://relaxng.org/ns/structure/1.0">
<optional>
<attribute name="type"/>
</optional>
<element name="year"><text/></element>
<element name="month"><text/></element>
<element name="day"><text/></element>
</element>
And here is a version of the schema in RELAX NG's compact syntax:
element date { attribute type { text }?,
element year { text },
element month { text },
element day { text } }
|
|
No pointy brackets! The RELAX NG schema in XML syntax has the same meaning as the compact schema, but the compact schema is less than half the length. For those who develop schema without the aid of a dedicated editor (not much exists yet for RELAX NG anyway), that's considerably less time tapping the keyboard. Still and all, the compact syntax has more advantages than just concision.
In these simple examples, the equivalencies are apparent, but I'll mention a few things about them anyway. Take, for example, the declaration or definition of elements. In the DTD, an element type declaration takes the form:
<!ELEMENT year (#PCDATA)>
In RELAX NG's XML syntax, the same element definition looks like
<element name="year"><text/></element>
This definition is nicely trimmed down in the compact syntax:
element year { text }
Compact attributes, likewise, have gone on a diet. The attribute list
declaration for the implied (optional) attribute type looks like
this in the DTD:
<!ATTLIST date type CDATA #IMPLIED>
RELAX NG makes an attribute definition like so:
<optional>
<attribute name="type"/>
</optional>
Which is equivalent to
<optional>
<attribute name="type"><text/></attribute>
</optional>
The type of the value (was CDATA, now
<text/>) is assumed to be text when absent in the XML
syntax. The placement of the attribute definition in RELAX NG follows the
structure of the XML document instance, which is one reason why the syntax of
RELAX NG is rather intuitive. Unlike the RELAX NG XML syntax, the compact
definition of the attribute
attribute type { text }?
must use the text token. The ? repetition operator
(zero or one <optional>) descends from regular expression
notation by way of DTDs, as do the operators *
(<zeroOrMore>) and +
(<oneOrMore>). The comma (,) operator, sometimes
called the sequence operator, when alone, means use exactly one. (Only
? makes sense as an operator for attributes because XML only allows
one specification of any given attribute in a start tag.)
|
To translate a compact-syntax schema to XML syntax, use James Clark's Trang. Assuming
that you have downloaded both Jing and Trang, have
a Java runtime environment in place, and have
placed the JAR files in the path and classpath (see
instructions on this for
Unix or
Windows if you are shaky on such terms), you can convert compact syntax to
XML with the following command:
java -jar trang.jar date.rnc date.rng
Trang requires two arguments: an input and output file. File formats are
inferred by Trang according to the file suffixes (rng for XML
syntax, rnc for compact). You can override this behavior with the
-i option (rnc or rng) for input files
and the -o option for output files (rng or
dtd). As you can see, in addition to XML format, you can also
specify DTD output from Trang. (Incidentally, you can also use DTDinst to translate a DTD to
a schema in RELAX NG XML syntax.) Further, you can indicate Trang's output
encoding with the -e option (for example, -e
ISO-8859-1); if the -e option is not present, the output
file is written in UTF-8 by default. Trang, by the way, automatically inserts
the namespace declaration for RELAX NG itself.
With Jing, you can conveniently validate an instance against a compact schema
directly, without transforming it to XML syntax, by using the -c
option:
java -jar jing.jar -c date.rnc date.xml
There are a few additional features worth highlighting. The following compact
schema contains a definition for a default namespace for elements in an
instance, adds a comment (the line prepended with #), and changes
the content model of the child elements of date:
default namespace = "http://www.example.com/date"
# RELAX NG schema for a date
element date { attribute type { text },
( element year { text } &
element month { text } &
element day { text } ) }
Notice that the ? operator was dropped from the attribute
definition, so the type attribute in now required. The ampersand
(&) indicates that adjoining elements are
interleaved, that is, these elements can appear in any order in an
instance. Parentheses enclose these element definitions. When the schema is
translated, the compact comment will appear as a normal XML comment:
|
|
<-- RELAX NG schema for a date -->
The default namespace for the instance will become an ns
attribute in the document element of the schema:
<element name="date" ns="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0">
In RELAX NG, any element which contains a pattern may also serve as a
document element (for example, <grammar>,
<element>, and <choice>), so you're not
limited to a single element.
So far the elements and attributes in the examples have only used
text content. RELAX NG, as a matter of fact, has only two built-in
datatypes: token and string (which is like
text). You can also use XML Schema datatypes.
In the following compact schema, you can see the datatypes token
which defines the value of a datatypeLibrary attribute
(http://www.w3.org/2001/XMLSchema-datatypes) and a prefix for the
QNames of the datatypes (xsd):
# Using XML Schema datatypes
namespace dt = "http://www.example.com/date"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
## An ISO 8601, US, or European date format
element dt:date { attribute type {"ISO8601" | "US" | "Euro"},
( element dt:day { xsd:string { pattern = "\d{2}" } } &
element dt:month { attribute days { "28" | "29" | "30" | "31" }?,
xsd:string { pattern = "\d{2}" } } &
element dt:year { xsd:string { pattern = "\d{4}" } } )
}
You can specify the datatype library and prefix explicitly as shown, or you
can omit this line and allow Trang to insert it automatically during
translation. Either way, the datatypeLibrary attribute will appear
in the document element:
<element name="dt:date"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:dt="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
The element <dt:month> (like <dt:day>
and <dt:year>) are all of type xsd:string. The
pattern parameter specifies a regular expression, further
constraining the allowed content for the element to two digits. The optional
attribute days provides a
choice of literals as values. For example, the value of days in
an instance may be one of 28, 29, 30, or
31. Other values are invalid.
<element name="dt:month">
<optional>
<attribute name="days">
<choice>
<value>28</value>
<value>29</value>
<value>30</value>
<value>31</value>
</choice>
</attribute>
</optional>
<data type="string">
<param name="pattern">\d{2}</param>
</data>
</element>
This example also shows how to add an annotation element
(<a:documentation>) with the double hash
(##). This annotation comes from the RELAX NG
compatibility spec and, once translated, looks like
<a:documentation>An ISO 8601, US, or European date
format</a:documentation>
You can also specify annotations with square brackets in the compact form, like
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
[ a:documentation [ "An ISO 8601, US, or European date format" ] ]
When you use this form, you must declare a namespace and prefix for the annotation element. You can insert elements and attributes as annotations from any namespace you like -- say, XHTML -- as long the namespace is declared and the bracketed syntax is used. Default attribute values may also represented as
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
element today {
[ a:defaultValue = "2002" ]
attribute year { text },
[ a:defaultValue = "06" ]
attribute month { text },
[ a:defaultValue = "01" ]
attribute day { text },
empty }
Notice the empty token at the end of the content model. As you
can probably guess, this keyword signifies that the element today
is an empty element. This syntax is analogous to the DTD syntax:
<!ELEMENT today EMPTY>
|
I mentioned earlier in the article that the compact syntax can look like a
context-free grammar. The following example uses a start symbol and other
symbols that serve as terminals and non-terminals. For example, the symbol
year, on the left side of the equals sign, may be considered a
non-terminal, and the element definition on right side, a terminal:
# RELAX NG schema for a date
start = date
date = element date { attribute type { text },
(year & month & day), limits*}
year = element year { text }
month = element month { text }
day = element day { text }
include "limits.rnc"
The following instance is valid with regard to the foregoing:
<?xml version="1.0" encoding="UTF-8"?>
<date type="US">
<month>June</month>
<day>1</day>
<year>2002</year>
<limits days="30"/>
</date>
When translated, this example creates a different RELAX NG schema than the
ones shown previously, producing a <grammar> and
<start> element and several <define>
elements, as seen in this incomplete fragment:
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<!-- RELAX NG schema for a date -->
<start>
<ref name="date"/>
</start>
<define name="date">
<element name="date">
<attribute name="type"/>
<interleave>
<ref name="year"/>
<ref name="month"/>
<ref name="day"/>
</interleave>
<zeroOrMore>
<ref name="limits"/>
</zeroOrMore>
</element>
</define>
A <grammar> element is a container for definitions. The
<start> element indicates the document element for an
instance, just as a document type declaration does. The
<define> elements contain patterns which can be referenced by
name (with a <ref> element) and therefore easily reused.
Back in the compact schema, a symbol for the limits pattern
(modified with *) was added to the end of the content model for
date, but where is it defined? It's defined in the included schema
limits.rnc (see the last line of the last compact example), which
looks like
# Limits for year, months, and days
limits =
element limits {
attribute years { text }?,
attribute months { text }?,
attribute days { text }?
}
When processed, the included compact schema is translated into RELAX NG XML
syntax as well. The resulting filename, limits.rng, is inferred
from limits.rnc. The included pattern contains a definition for the
limits element, which may contain up to three optional
attributes. The absence of element children indicates that its content is
empty.
It would several more articles to cover all aspects of RELAX NG in fair detail. This article has only touched lightly on its compact syntax and some of the more commonly used structures of the language. I have neglected some interesting things: for example, lists, name classes, merging grammars, and combining definitions. If you've gotten behind the wheel and tested these examples for yourself, you likely have a good feel for just how easy RELAX NG's compact syntax is to learn and use.
"The Design of RELAX NG," a paper by James Clark
RELAX NG 1.0 DTD compatibility specification
RELAX NG compact syntax specification
Jing, James Clark's RELAX NG processor (Java)
Trang, James Clark's RELAX NG compact syntax processor (Java)
Multi-schema Validator (MSV), Sun's schema validator (by Kawaguchi Kohsuke)
Murata Makoto's online RELAX NG validator (Java/JSP)
Eric van der Vlist's online RELAX NG validator (Python)
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.