XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RELAX NG's Compact Syntax
by Michael Fitzgerald | Pages: 1, 2, 3

Processing Compact Syntax Schema

To translate a compact-syntax schema to XML syntax, use James Clark's Trang. Assuming that you have downloaded both Jing and Trang, have a Java runtime environment in place, and have placed the JAR files in the path and classpath (see instructions on this for Unix or Windows if you are shaky on such terms), you can convert compact syntax to XML with the following command:


java -jar trang.jar date.rnc date.rng

Trang requires two arguments: an input and output file. File formats are inferred by Trang according to the file suffixes (rng for XML syntax, rnc for compact). You can override this behavior with the -i option (rnc or rng) for input files and the -o option for output files (rng or dtd). As you can see, in addition to XML format, you can also specify DTD output from Trang. (Incidentally, you can also use DTDinst to translate a DTD to a schema in RELAX NG XML syntax.) Further, you can indicate Trang's output encoding with the -e option (for example, -e ISO-8859-1); if the -e option is not present, the output file is written in UTF-8 by default. Trang, by the way, automatically inserts the namespace declaration for RELAX NG itself.

With Jing, you can conveniently validate an instance against a compact schema directly, without transforming it to XML syntax, by using the -c option:


java -jar jing.jar -c date.rnc date.xml

Extending the Example

There are a few additional features worth highlighting. The following compact schema contains a definition for a default namespace for elements in an instance, adds a comment (the line prepended with #), and changes the content model of the child elements of date:


default namespace = "http://www.example.com/date"
# RELAX NG schema for a date
element date { attribute type { text },
( element year { text } &
element month { text } &
element day { text } ) }

Notice that the ? operator was dropped from the attribute definition, so the type attribute in now required. The ampersand (&) indicates that adjoining elements are interleaved, that is, these elements can appear in any order in an instance. Parentheses enclose these element definitions. When the schema is translated, the compact comment will appear as a normal XML comment:

Comment on this articleDo you have a question or a comment on this article? Share them in our forum.
Post your comments


<-- RELAX NG schema for a date -->

The default namespace for the instance will become an ns attribute in the document element of the schema:


<element name="date" ns="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0">

In RELAX NG, any element which contains a pattern may also serve as a document element (for example, <grammar>, <element>, and <choice>), so you're not limited to a single element.

XML Schema Datatypes

So far the elements and attributes in the examples have only used text content. RELAX NG, as a matter of fact, has only two built-in datatypes: token and string (which is like text). You can also use XML Schema datatypes.

In the following compact schema, you can see the datatypes token which defines the value of a datatypeLibrary attribute (http://www.w3.org/2001/XMLSchema-datatypes) and a prefix for the QNames of the datatypes (xsd):


# Using XML Schema datatypes
namespace dt = "http://www.example.com/date"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

## An ISO 8601, US, or European date format
element dt:date { attribute type {"ISO8601" | "US" | "Euro"},
( element dt:day { xsd:string { pattern = "\d{2}" } } &
element dt:month { attribute days { "28" | "29" | "30" | "31" }?, 
xsd:string { pattern = "\d{2}" } } &
element dt:year { xsd:string { pattern = "\d{4}" } } )
}

You can specify the datatype library and prefix explicitly as shown, or you can omit this line and allow Trang to insert it automatically during translation. Either way, the datatypeLibrary attribute will appear in the document element:


<element name="dt:date"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:dt="http://www.example.com/date"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

The element <dt:month> (like <dt:day> and <dt:year>) are all of type xsd:string. The pattern parameter specifies a regular expression, further constraining the allowed content for the element to two digits. The optional attribute days provides a choice of literals as values. For example, the value of days in an instance may be one of 28, 29, 30, or 31. Other values are invalid.


<element name="dt:month">
 <optional>
  <attribute name="days">
   <choice>
    <value>28</value>
    <value>29</value>
    <value>30</value>
    <value>31</value>
   </choice>
  </attribute>
 </optional>
 <data type="string">
  <param name="pattern">\d{2}</param>
 </data>
</element>

This example also shows how to add an annotation element (<a:documentation>) with the double hash (##). This annotation comes from the RELAX NG compatibility spec and, once translated, looks like


<a:documentation>An ISO 8601, US, or European date
 format</a:documentation>

You can also specify annotations with square brackets in the compact form, like


namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"

[ a:documentation [ "An ISO 8601, US, or European date format" ] ]

When you use this form, you must declare a namespace and prefix for the annotation element. You can insert elements and attributes as annotations from any namespace you like -- say, XHTML -- as long the namespace is declared and the bracketed syntax is used. Default attribute values may also represented as


namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"

element today {

 [ a:defaultValue = "2002" ]
 attribute year { text },

 [ a:defaultValue = "06" ]
 attribute month { text },

 [ a:defaultValue = "01" ]
 attribute day { text },

 empty }

Notice the empty token at the end of the content model. As you can probably guess, this keyword signifies that the element today is an empty element. This syntax is analogous to the DTD syntax:


<!ELEMENT today EMPTY>

Pages: 1, 2, 3

Next Pagearrow