XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RELAX NG's Compact Syntax

June 19, 2002

Working with XML Schema is like driving a limousine. It's true that it has some nice appointments (datatypes come to mind), but the wheelbase is a bit on the long side, making it difficult to turn corners easily, and I am inclined to let somebody else do the driving for me. Using RELAX NG, on the other hand, is like driving a sports car. It holds corners amazingly well, and I am much less interested in handing over the keys to anyone. You may prefer to drive a limo over a sports car. But I'll take the sports car any day.

You are probably familiar with XML Schema and RELAX NG. Both are schema languages for XML. The former was released by the W3C in May 2001, while the latter was released in December 2001 by OASIS. RELAX NG, which was developed by a small technical committee lead by James Clark, merges Murata Makoto's RELAX and Clark's TREX. It is a simple, yet elegant evolution of the DTD, which is also easy to learn. It is modular in design. The main core of RELAX NG is focused on validation alone and doesn't modify the infoset in the process of validation; in other words, no PSVI. RELAX NG is also part of an ISO draft standard, ISO/IEC DIS 19757-2.

RELAX NG schemas were originally written in XML, but there's also a compact, non-XML syntax. While this article doesn't contain an exhaustive review of all the features of RELAX NG, it will give you a good idea of how to use the main parts of the compact syntax. If you don't know much about RELAX NG, I suggest that you read Eric van der Vlist's RELAX NG Compared before finishing this article.

Related Reading

XML Schema

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

I think you'll find the compact syntax quite readable and easy to learn. In some respects, a RELAX NG schema in compact form looks like a context-free grammar, which provides a familiar view of the language, is readily comprehensible, and amenable to parsing. Also, don't be surprised if the compact syntax bears a resemblance to the syntax of XDuce and XQuery's computed element and attribute constructors.

A Simple Example

The first exhibit is a well-formed XML document which represents an ISO 8601 date:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE date SYSTEM "date.dtd">

<date type="ISO8601">
 <year>2002</year>
 <month>06</month>
 <day>01</day>
</date>

This document is an instance of the following DTD:


<!ELEMENT date (year, month, day)>
<!ATTLIST date type CDATA #IMPLIED>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>

It is also valid with regard to the following RELAX NG schema in XML syntax:


<?xml version="1.0" encoding="UTF-8"?>
<element name="date"
xmlns="http://relaxng.org/ns/structure/1.0">
 <optional>
  <attribute name="type"/>
 </optional>
 <element name="year"><text/></element>
 <element name="month"><text/></element>
 <element name="day"><text/></element>
</element>

And here is a version of the schema in RELAX NG's compact syntax:


element date { attribute type { text }?,
element year { text },
element month { text }, 
element day { text } }

Comment on this articleDo you have a question or a comment on this article? Share them in our forum.
Post your comments

No pointy brackets! The RELAX NG schema in XML syntax has the same meaning as the compact schema, but the compact schema is less than half the length. For those who develop schema without the aid of a dedicated editor (not much exists yet for RELAX NG anyway), that's considerably less time tapping the keyboard. Still and all, the compact syntax has more advantages than just concision.

A Little Schema Analysis

In these simple examples, the equivalencies are apparent, but I'll mention a few things about them anyway. Take, for example, the declaration or definition of elements. In the DTD, an element type declaration takes the form:


<!ELEMENT year (#PCDATA)>

In RELAX NG's XML syntax, the same element definition looks like


<element name="year"><text/></element>

This definition is nicely trimmed down in the compact syntax:


element year { text }

Compact attributes, likewise, have gone on a diet. The attribute list declaration for the implied (optional) attribute type looks like this in the DTD:


<!ATTLIST date type CDATA #IMPLIED>

RELAX NG makes an attribute definition like so:


<optional>
 <attribute name="type"/>
</optional>

Which is equivalent to


<optional>
 <attribute name="type"><text/></attribute>
</optional>

The type of the value (was CDATA, now <text/>) is assumed to be text when absent in the XML syntax. The placement of the attribute definition in RELAX NG follows the structure of the XML document instance, which is one reason why the syntax of RELAX NG is rather intuitive. Unlike the RELAX NG XML syntax, the compact definition of the attribute


attribute type { text }?

must use the text token. The ? repetition operator (zero or one <optional>) descends from regular expression notation by way of DTDs, as do the operators * (<zeroOrMore>) and + (<oneOrMore>). The comma (,) operator, sometimes called the sequence operator, when alone, means use exactly one. (Only ? makes sense as an operator for attributes because XML only allows one specification of any given attribute in a start tag.)

Pages: 1, 2, 3

Next Pagearrow







close