XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Validation by Instance
by Michael Fitzgerald | Pages: 1, 2

Translating the DTD to RELAX NG

James Clark's DTDinst is a Java tool that translates a DTD either into its own XML vocabulary or into a schema in RELAX NG's XML syntax. After downloading and installing dtdinst.jar, you can issue the following command to translate a DTD into RELAX NG:

java -jar dtdinst.jar -i -r rng event.dtd 

This command uses the -jar option because the JAR manifest contains the line:

Main-Class: com/thaiopensource/xml/dtd/app/Driver 

In other words, the manifest tells the Java interpreter where to find the class the contains the main() method, so you don't have to inform the Java interpreter of that fact directly.

The first argument, the -i option, tells DTDinst to write the RELAX NG attribute elements inline, as children of containing element definitions, rather than as children of define elements. Next, the -r option specifies the directory where the RELAX NG schema should be written. If the directory you name does not exist, it will be created for you. The output file will have the same file name as the DTD, but it will have an rng suffix. The last argument, event.dtd, is of course the DTD that I generated earlier.

The resulting RELAX NG schema event.rng (in the rng directory) looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by DTDinst version 2002-07-24. -->
<grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
 xmlns="http://relaxng.org/ns/structure/1.0">
 <define name="date">
  <element name="date">
   <attribute name="type">
    <choice>
     <value>Euro</value>
     <value>ISO</value>
     <value>US</value>
    </choice>
   </attribute>
   <zeroOrMore>
    <choice>
     <ref name="day"/>
     <ref name="month"/>
     <ref name="year"/>
    </choice>
   </zeroOrMore>
  </element>
 </define>
 <define name="day">
  <element name="day">
   <text/>
  </element>
 </define>
 <define name="description">
  <element name="description">
   <text/>
  </element>
 </define>
 <define name="event">
  <element name="event">
   <ref name="description"/>
   <oneOrMore>
    <ref name="date"/>
   </oneOrMore>
  </element>
 </define>
 <define name="month">
  <element name="month">
   <text/>
  </element>
 </define>
 <define name="year">
  <element name="year">
   <text/>
  </element>
 </define>
 <start>
  <choice>
   <ref name="event"/>
  </choice>
 </start>
</grammar>

As you can tell, a RELAX NG schema is easy to grasp. For example, in the date definition, you can easily see that the date element's required attribute type may have one of three possible values, Euro, ISO, or US. Also, the text element is a dead ringer for #PCDATA. Need I go on?

DTDinst generates a grammar element which is a container for define elements. A grammar element must also contain a start element which indicates the document element for the instance. I think the choice element surrounding the reference to the event definition is unnecessary, so I will delete it in my own version (see new-event.rng).

Related Reading

XML Schema

XML Schema
The W3C's Object-Oriented Descriptions for XML
By Eric van der Vlist

The schema could be rewritten without define elements and references to those definitions (the ref elements), but the schema is sufficient as it stands. Now I'll take the translation process a step further by adding WXS to our list.

Translating RELAX NG to XML Schema

Trang is a another tool written by James Clark. It can take as input a schema written in RELAX NG XML and compact syntax; it can produce RELAX NG XML, RELAX NG compact syntax, DTD, and WXS as output. After downloading Trang (which includes a JAR file for Jing, a RELAX NG validator), unzipping and installing it, you can convert the RELAX NG schema back to a DTD new-event.dtd with this command:

java -jar trang.jar rng/event.rng new-event.dtd

The DTD output of Trang is nearly identical to the one produced by DTDGenerator. If the file suffixes used with Trang don't match the implied content of the file, you can also specify the input file with the -i option and output file with the -o option. You can name either rng or rnc as input, and one of rng rnc, dtd, or xsd as output. For example, using -i and -o you can issue the preceding command as

java -jar trang.jar -i rng -o dtd rng/event.rng new-event.dtd

You can also produce XML Schema output with the command:

java -jar trang.jar rng/event.rng event.xsd

Trang's WXS output is still in the alpha stage, so there may be some changes in the future. The WXS output from event.rng follows:

<xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 elementFormDefault="qualified" version="1.0">
 <xs:element name="date">
  <xs:complexType>
   <xs:choice minOccurs="0" maxOccurs="unbounded">
    <xs:element ref="day"/>
    <xs:element ref="month"/>
    <xs:element ref="year"/>
   </xs:choice>
   <xs:attribute name="type" use="required">
    <xs:simpleType>
     <xs:restriction base="xs:token">
      <xs:enumeration value="Euro"/>
      <xs:enumeration value="ISO"/>
      <xs:enumeration value="US"/>
     </xs:restriction>
    </xs:simpleType>
   </xs:attribute>
  </xs:complexType>
 </xs:element>
 <xs:element name="day">
  <xs:complexType mixed="true"/>
 </xs:element>
 <xs:element name="description">
  <xs:complexType mixed="true"/>
 </xs:element>
 <xs:element name="event">
  <xs:complexType>
   <xs:sequence>
    <xs:element ref="description"/>
    <xs:element maxOccurs="unbounded" ref="date"/>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
 <xs:element name="month">
  <xs:complexType mixed="true"/>
 </xs:element>
 <xs:element name="year">
  <xs:complexType mixed="true"/>
 </xs:element>
</xs:schema>

The date element might have been defined as a regular complex type rather than as an anonymous type, but this works nonetheless. Also, this construct occurs four times in the schema:


<xs:element name="day">
 <xs:complexType mixed="true"/>
</xs:element>

This says that the content of the day element (and the content of description, month, and year as well) implicitly allows text node children only. This is a little unclear at first glance. In my version, I changed the element content by hand as follows in all four instances (see it in new-event.xsd):

<xs:element name="day" type="xs:string"/>

Now that I have derived schemas from an XML document in DTD, RELAX NG, and W3C XML Schema, I'll attempt to validate the original instance against all three.

Validating the Instance

There are a number of validators to choose from, but I'll use Sun's Multi-Schema Validator because it can validate against schemas in all three formats: DTD, RELAX NG, and W3C XML Schema. Assuming that you have downloaded MSV and that all the JARs are installed (there are four), here is the command for performing the validation for the DTD:

java -cp xerces.jar;xsdlib.jar;relaxngDatatype.jar;isorelax.jar 
-jar msv.jar event.dtd event.xml  

If you are on the Windows platform, you can use a batch file I created to simplify the command (see msv.bat in the file archive).

To validate against the other schema, replace event.dtd with the name of some other schema file. You can test all the schemas in the file archive if you like. They are all valid, though MSV issues a warning about elements that have the content:


<xs:complexType mixed="true"/>

Conclusion

If you work on the Windows platform, I have also written a set of batch files that will perform all the translations (from instance, to DTD, to RELAX NG, and finally to W3C XML Schema) and then validate against them in one simple step. You will find this batch file, validate.bat, in the archive. This batch file also calls or accesses four other batch files in the same directory (dtd.bat, rng.bat, xsd.bat, and msv.bat).

Using the tools I've described here, you can perform the conversions and validate against the resulting schemas in a matter of seconds. You may still prefer to use a visual editor, but I believe that learning and using these tools can save you time and money.

Related Links



1 to 3 of 3
  1. Nice to know
    2002-11-07 07:01:53 sunil Kothari
  2. Yawwwwwn
    2002-10-04 04:53:09 j ogden
  3. XML=>DTD=>XSD?
    2002-08-29 18:11:01 Michael Maron
1 to 3 of 3