Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

RELAX NG's Compact Syntax
by Michael Fitzgerald | Pages: 1, 2, 3

Going Context Free

I mentioned earlier in the article that the compact syntax can look like a context-free grammar. The following example uses a start symbol and other symbols that serve as terminals and non-terminals. For example, the symbol year, on the left side of the equals sign, may be considered a non-terminal, and the element definition on right side, a terminal:


# RELAX NG schema for a date

start = date

date = element date { attribute type { text },
 (year & month & day), limits*}

year = element year { text }
month = element month { text }
day = element day { text }

include "limits.rnc"

The following instance is valid with regard to the foregoing:


<?xml version="1.0" encoding="UTF-8"?>
<date type="US">
 <month>June</month>
 <day>1</day>
 <year>2002</year>
 <limits days="30"/>
</date>

When translated, this example creates a different RELAX NG schema than the ones shown previously, producing a <grammar> and <start> element and several <define> elements, as seen in this incomplete fragment:


<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <!-- RELAX NG schema for a date -->
  <start>
    <ref name="date"/>
  </start>
  <define name="date">
    <element name="date">
      <attribute name="type"/>
      <interleave>
        <ref name="year"/>
        <ref name="month"/>
        <ref name="day"/>
      </interleave>
      <zeroOrMore>
        <ref name="limits"/>
      </zeroOrMore>
    </element>
  </define>

A <grammar> element is a container for definitions. The <start> element indicates the document element for an instance, just as a document type declaration does. The <define> elements contain patterns which can be referenced by name (with a <ref> element) and therefore easily reused.

Back in the compact schema, a symbol for the limits pattern (modified with *) was added to the end of the content model for date, but where is it defined? It's defined in the included schema limits.rnc (see the last line of the last compact example), which looks like


# Limits for year, months, and days
limits =
 element limits {
  attribute years { text }?,
  attribute months { text }?,
  attribute days { text }?
 }

When processed, the included compact schema is translated into RELAX NG XML syntax as well. The resulting filename, limits.rng, is inferred from limits.rnc. The included pattern contains a definition for the limits element, which may contain up to three optional attributes. The absence of element children indicates that its content is empty.

Conclusion

It would several more articles to cover all aspects of RELAX NG in fair detail. This article has only touched lightly on its compact syntax and some of the more commonly used structures of the language. I have neglected some interesting things: for example, lists, name classes, merging grammars, and combining definitions. If you've gotten behind the wheel and tested these examples for yourself, you likely have a good feel for just how easy RELAX NG's compact syntax is to learn and use.

Related Links

RELAX NG home

"The Design of RELAX NG," a paper by James Clark

RELAX NG 1.0 tutorial

RELAX NG 1.0 specification

RELAX NG 1.0 DTD compatibility specification

RELAX NG compact syntax specification

Jing, James Clark's RELAX NG processor (Java)

Trang, James Clark's RELAX NG compact syntax processor (Java)

Multi-schema Validator (MSV), Sun's schema validator (by Kawaguchi Kohsuke)

Murata Makoto's online RELAX NG validator (Java/JSP)

Eric van der Vlist's online RELAX NG validator (Python)


Comment on this articleDo you have a question or a comment on this article? Share them in our forum.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Something wrong
    2006-10-27 08:34:49 johnhmichael [Reply]

    The tutorial about RELAX NG's compact syntax and default value reads something like:


    [ a:defaultValue = "bla-bla" ]
    { attribute a0 { huhu }}


    It seems to me that a question mark should close this expression:


    [ ... ] attribute a0 { ... }}?


    Anyway, the "jing" valider signals an error if it not the case.


    All the best,


    johnhmichael


  • So what?
    2002-06-21 09:46:58 Paul Strand [Reply]

    I thought the whole point of defining something other than DTD's (Relax NG or Schema, for example) was to get away from the syntax of DTD's which is not as readily parseable.


    If XML is described by XML, you only need one parser at your side. Unfortunately, W3C Schemas are so complicated that the advantage of learning only one syntax is offset by the inability to quickly read that syntax for meaning. My take on this was that Relax NG was the compromise. However, they just reinvented DTD's by deriving an XML syntax based off of the DTD syntax and then derived a completely new syntax from that.


    If they were going to go to all that trouble, why didn't they just use "micro processing" to simplify the syntax of DTD's into something easily readible, yet XML parseable? For example (and forgive the gratuitous use of Polish notation), they could have done something like this:


    <!ELEMENT DTD (ELEMENT+)>
    <!ATTLIST DTD
    root NMTOKEN #IMPLIED>


    <!ELEMENT ELEMENT (ATTRIBUTE*)>
    <!ATTLIST ELEMENT
    name NMTOKEN #REQUIRED
    sequence CDATA #IMPLIED
    content (PCDATA | EMPTY | ANY) #IMPLIED>


    <!ELEMENT ATTRIBUTE EMPTY>
    <!ATTLIST ATTRIBUTE
    name NMTOKEN #REQUIRED
    type (CDATA | NMTOKEN | NMTOKENS | ID | IDREF | IDREFS) #IMPLIED
    enumeration CDATA #IMPLIED
    declaration (IMPLIED | REQUIRED | FIXED) #IMPLIED
    default CDATA #IMPLIED>


    ...which defines a DTD-similar syntax, reusing "*", "?", "|", "+" and ",", from DTD's as operators. The above is the DTD which defines how that syntax would be used. Below is how it would be expressed in its own syntax:


    <!DOCTYPE DTD SYSTEM "XMLDTD.dtd">
    <DTD root="DTD">
    <ELEMENT name="DTD" sequence="+ELEMENT">
    <ATTRIBUTE name="root" type="NMTOKEN" declaration="IMPLIED"/>
    </ELEMENT>


    <ELEMENT name="ELEMENT" sequence="*ATTRIBUTE">
    <ATTRIBUTE name="name" type="NMTOKEN" declaration="REQUIRED"/>
    <ATTRIBUTE name="sequence" type="CDATA" declaration="IMPLIED"/>
    <ATTRIBUTE name="content" enumeration="| PCDATA | EMPTY ANY" declaration="IMPLIED"/>
    </ELEMENT>


    <ELEMENT name="ATTRIBUTE" content="EMPTY">
    <ATTRIBUTE name="name" type="NMTOKEN" declaration="REQUIRED"/>
    <ATTRIBUTE name="type" enumeration="| CDATA | NMTOKEN | NMTOKENS | ID | IDREF IDREFS" declaration="IMPLIED"/>
    <ATTRIBUTE name="enumeration" type="CDATA" declaration="IMPLIED"/>
    <ATTRIBUTE name="declaration" enumeration="| IMPLIED | REQUIRED FIXED" declaration="IMPLIED"/>
    <ATTRIBUTE name="default" type="CDATA" declaration="IMPLIED"/>
    </ELEMENT>
    </DTD>


    (For those of you not familiar with Polish notation, the operator comes before the one or two operands. So, A + B in Polish notation becomes + A B. This removes the necessity for parenthesis and makes parsing in this case easier.)


    So, what does this compact syntax gain me? Actually, why go with Relax NG at all if it is not a Recommendation and it suffers from the pitfalls of both DTD's and Schema?