RELAX NG's Compact Syntax
by Michael Fitzgerald
|
Pages: 1, 2, 3
Processing Compact Syntax Schema
To translate a compact-syntax schema to XML syntax, use James Clark's Trang. Assuming
that you have downloaded both Jing and Trang, have
a Java runtime environment in place, and have
placed the JAR files in the path and classpath (see
instructions on this for
Unix or
Windows if you are shaky on such terms), you can convert compact syntax to
XML with the following command:
java -jar trang.jar date.rnc date.rng
Trang requires two arguments: an input and output file. File formats are
inferred by Trang according to the file suffixes (rng for XML
syntax, rnc for compact). You can override this behavior with the
-i option (rnc or rng) for input files
and the -o option for output files (rng or
dtd). As you can see, in addition to XML format, you can also
specify DTD output from Trang. (Incidentally, you can also use DTDinst to translate a DTD to
a schema in RELAX NG XML syntax.) Further, you can indicate Trang's output
encoding with the -e option (for example, -e
ISO-8859-1); if the -e option is not present, the output
file is written in UTF-8 by default. Trang, by the way, automatically inserts
the namespace declaration for RELAX NG itself.
With Jing, you can conveniently validate an instance against a compact schema
directly, without transforming it to XML syntax, by using the -c
option:
java -jar jing.jar -c date.rnc date.xml
Extending the Example
There are a few additional features worth highlighting. The following compact
schema contains a definition for a default namespace for elements in an
instance, adds a comment (the line prepended with #), and changes
the content model of the child elements of date:
default namespace = "http://www.example.com/date"
# RELAX NG schema for a date
element date { attribute type { text },
( element year { text } &
element month { text } &
element day { text } ) }
Notice that the ? operator was dropped from the attribute
definition, so the type attribute in now required. The ampersand
(&) indicates that adjoining elements are
interleaved, that is, these elements can appear in any order in an
instance. Parentheses enclose these element definitions. When the schema is
translated, the compact comment will appear as a normal XML comment:
|
|
<-- RELAX NG schema for a date -->
The default namespace for the instance will become an ns
attribute in the document element of the schema:
<element name="date" ns="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0">
In RELAX NG, any element which contains a pattern may also serve as a
document element (for example, <grammar>,
<element>, and <choice>), so you're not
limited to a single element.
XML Schema Datatypes
So far the elements and attributes in the examples have only used
text content. RELAX NG, as a matter of fact, has only two built-in
datatypes: token and string (which is like
text). You can also use XML Schema datatypes.
In the following compact schema, you can see the datatypes token
which defines the value of a datatypeLibrary attribute
(http://www.w3.org/2001/XMLSchema-datatypes) and a prefix for the
QNames of the datatypes (xsd):
# Using XML Schema datatypes
namespace dt = "http://www.example.com/date"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
## An ISO 8601, US, or European date format
element dt:date { attribute type {"ISO8601" | "US" | "Euro"},
( element dt:day { xsd:string { pattern = "\d{2}" } } &
element dt:month { attribute days { "28" | "29" | "30" | "31" }?,
xsd:string { pattern = "\d{2}" } } &
element dt:year { xsd:string { pattern = "\d{4}" } } )
}
You can specify the datatype library and prefix explicitly as shown, or you
can omit this line and allow Trang to insert it automatically during
translation. Either way, the datatypeLibrary attribute will appear
in the document element:
<element name="dt:date"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:dt="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
The element <dt:month> (like <dt:day>
and <dt:year>) are all of type xsd:string. The
pattern parameter specifies a regular expression, further
constraining the allowed content for the element to two digits. The optional
attribute days provides a
choice of literals as values. For example, the value of days in
an instance may be one of 28, 29, 30, or
31. Other values are invalid.
<element name="dt:month">
<optional>
<attribute name="days">
<choice>
<value>28</value>
<value>29</value>
<value>30</value>
<value>31</value>
</choice>
</attribute>
</optional>
<data type="string">
<param name="pattern">\d{2}</param>
</data>
</element>
This example also shows how to add an annotation element
(<a:documentation>) with the double hash
(##). This annotation comes from the RELAX NG
compatibility spec and, once translated, looks like
<a:documentation>An ISO 8601, US, or European date
format</a:documentation>
You can also specify annotations with square brackets in the compact form, like
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
[ a:documentation [ "An ISO 8601, US, or European date format" ] ]
When you use this form, you must declare a namespace and prefix for the annotation element. You can insert elements and attributes as annotations from any namespace you like -- say, XHTML -- as long the namespace is declared and the bracketed syntax is used. Default attribute values may also represented as
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
element today {
[ a:defaultValue = "2002" ]
attribute year { text },
[ a:defaultValue = "06" ]
attribute month { text },
[ a:defaultValue = "01" ]
attribute day { text },
empty }
Notice the empty token at the end of the content model. As you
can probably guess, this keyword signifies that the element today
is an empty element. This syntax is analogous to the DTD syntax:
<!ELEMENT today EMPTY>