Learning to RELAX
In this article, we'll explore some of the more advanced features of the RELAX schema language by using it to create a schema for the XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, I've chosen it because it's simple, quite widely used, looks a great deal like HTML, and its RELAX specification will use most of the features we want to focus on.
A RELAX document is enclosed within a <module>
element. The opening tag looks like
<module
moduleVersion="1.2"
relaxCoreVersion="1.0"
targetNamespace=""
xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
The module tag is followed by the <interface>
element which specifies the root element of the document
that this RELAX schema is intended to validate. In the
case of a news story, this is
<interface>
<export label="nitf"/>
</interface>
If you have two types of documents that are the same except for their root element, you can have one RELAX document that will validate either kind of document.
<interface>
<export label="account-receivable"/>
<export label="account-payable"/>
</interface>
After the interface is specified, you specify the types of elements to be validated.
Every element in the target markup language is described by an
<elementRule>, which describes its content, and a
corresponding <tag>.
Each <elementRule> plays
a role in the document's structure and has a
label to which other elements can refer.
An empty element, such as <br/>, is specified thus:
<elementRule role="br" label="br">
<empty/>
</elementRule>
<tag name="br"/>
If the label is omitted, it is presumed to be the same
as the role.
A news story has a <byline>, which in turn includes
a <bytag> element that declares who wrote the story.
The <bytag> element consists of string data.
RELAX specifies this as follows.
<elementRule role="byline">
<ref label="bytag"/>
</elementRule>
<tag name="byline"/>
<elementRule role="bytag" type="string"/>
<tag name="bytag"/>
The type attribute specifies an element's datatype. The valid values
for a datatype include those taken from the XML Document Type
Definition, such as NMTOKEN and ID, as well
as those introduced by XML
Schema Part 2: Datatypes, such as string,
float, nonPositiveInteger, etc.
As in HTML documents, an <nitf> news story element
contains <head> and <body>
elements respectively. In a news story, they are both required.
<elementRule role="nitf" label="nitf">
<sequence>
<ref label="head"/>
<ref label="body"/>
</sequence>
</elementRule>
On the other hand, some elements may contain sub-elements in any
order. News stories, like HTML, can have tables, and their table rows
(<tr>) can contain cells (either <td> or
<th>) in any order:
<elementRule role="tr">
<choice occurs="*">
<ref label="td"/>
<ref label="th"/>
</choice>
</elementRule>
The occurs attribute can be used with a
<choice>, <sequence>, or
<ref> tag. It has three possible values.
| * | occurs zero or more times |
| + | occurs one or more times |
| ? | occurs zero or one times |
You write a news story description list (<dl>),
containin an optional list header, followed by one or more optional
description titles, and required descriptive data entries thus:
<elementRule role="dl">
<sequence>
<ref label="lh" occurs="?"/>
<sequence occurs="+">
<ref label="dt" occurs="?"/>
<ref label="dd"/>
</sequence>
</sequence>
</elementRule>
Some information requires text that isn't between tags. For example,
a news story <location> can look like
The movie was filmed in
<location>
<city>Pekin</city>, a
small city in
<state>Illinois</state>
</location>.
The bold red text above is inside the <location> tag,
but it isn't part of any sub-element.
That makes <location> a mixed element, specified
(in part) like
<elementRule role="location">
<mixed>
<choice occurs="*">
<ref label="city"/>
<ref label="state"/>
<ref label="region"/>
<!-- etc. -->
</choice>
</mixed>
</elementRule>
If you look at the specification for a news story, you'll find that
a paragraph (<p>) is a mixed element that can
contain, among others, <br>,
<person>, <location>, and quoted
phrase (<q>) sub-elements.
Likewise a quoted phrase can contain exactly the same sub-elements. Rather than specify the common sub-elements twice, RELAX allows you to specify a hedge rule:
<hedgeRule label="common.elements">
<choice>
<ref label="br"/>
<ref label="person"/>
<ref label="location"/>
<ref label="q"/>
</choice>
</hedgeRule>
Once a hedge rule is established, other
specifications can refer to it by using the <hedgeRef> tag.
<elementRule role="p">
<mixed>
<hedgeRef label="common.elements" occurs="*"/>
</mixed>
</elementRule>
<elementRule role="q">
<mixed>
<hedgeRef label="common.elements" occurs="*"/>
</mixed>
</elementRule>
Note that the <hedgeRule> can only describe elements;
it cannot be <mixed>. That's why
each <elementRule> has its own
<mixed> tag in the example above.
You may have noticed that the specification says that a quoted phrase can contain another quoted phrase. While this may be unusual in a news story, it's technically possible, and RELAX is not bothered by this at all.
Of course, XML does not consist of elements alone, as we'll see in the next section.
|
XML tags can have attributes, and RELAX allows you to specify them in
great detail. A news story, like HTML, can include an
<img> tag which has a required src and
optional width and height attributes. RELAX
treats attributes as part of the tag, so the full specification of an
image is as follows:
<elementRule role="img"> <sequence> <!-- its sub-elements --> </sequence> </elementRule> <tag name="img"> <attribute name="src" required="true" type="string"/> <attribute name="width" type="positiveInteger"/> <attribute name="height" type="positiveInteger"/> </tag>
Notice that RELAX lets you specify that an image's width and height must be positive integers.
Just as it was possible to create a re-usable element specification,
so it is possible to create a set of attributes that can be reused by
many tags. For example, both the table body
(<tbody>) and table header
(<th>) elements have identical attributes for
determining their horizontal and vertical alignment. This makes those
attributes a perfect candidate for an attribute pool.
<attPool role="alignment"> <attribute name="align" type="string"> <enumeration value="left"/> <enumeration value="center"/> <enumeration value="right"/> <enumeration value="justify"/> </attribute> <attribute name="valign" type="string"> <enumeration value="top"/> <enumeration value="middle"/> <enumeration value="bottom"/> <enumeration value="baseline"/> </attribute> </attPool>
And now this attribute pool may be used in multiple tags.
<tag name="tbody"> <ref role="alignment"/> </tag> <tag name="th"> <ref role="alignment"/> <attribute name="rowspan" type="integer"> <minInclusive value="1"/> </attribute> <attribute name="colspan" type="integer"> <minInclusive value="1"/> </attribute> </tag>
There are two things to note about this example:
<th> tag has attributes in addition to
those included via the reference to the attribute pool;Practically everything we've done up to this point is possible with DTD specifications. Now let's examine something that RELAX can do that other methods can't do.
As the definition of a news story currently stands, both a list item
(<li>) and the <body.content>
tag may contain an information block (<block>)
element. An information block may contain, among other things,
<p>, <ul>,
<ol> and <img> elements.
Let's say that we would like to set up news stories so that a block in
the main body content can contain all these elements, but a block
inside of list items may not contain images. To do this, we
first set up the element rule for a <block> in the
body content and the corresponding tag. Note that we have a
label that is different from the role.
<elementRule role="block" label="block-in-content"> <mixed> <choice occurs="*"> <ref label="p"/> <ref label="ul"/> <ref label="ol"/> <ref label="img"/> </choice> </mixed> </elementRule> <tag name="block"/>
We then add another rule for the element that plays the role of a
block, but this rule is labeled for use in a list.
<elementRule role="block" label="block-in-list"> <mixed> <choice occurs="*"> <ref label="p"/> <ref label="ul"/> <ref label="ol"/> </choice> </mixed> </elementRule>
Both of these rules are for elements that play the role of a
block. In the definition of the
<body.content> element, we refer to the appropriate
<block> rule.
<elementRule role="body.content"> <ref label="block-in-content"/> </elementRule>
And in the definition of a list item, we refer to the other rule.
<elementRule role="li"> <ref label="block-in-list"/> </elementRule>
The <block> tag will now be validated differently,
depending on the context in which it appears.
A news story can use the <money> tag to indicate
that a number is a monetary amount. Its RELAX definition looks
like
<elementRule role="money" type="decimal"/> <tag name="money"> <attribute name="unit" type="string"/> </tag>
Let's say we'd like to extend this definition so that there are two
types of <money> elements, one for costs and
another for balances. An item's cost is always positive; a company's
balance can be either positive or negative. In other words, we'd like
to be able to specify that
After paying <money unit="dollar" usage="cost">5</money> dollars, the account had a balance of <money unit="dollar" usage="balance">-10</money> dollars.
In this case, rather than using the context to choose the content of a tag, we want to use one of the tag's attributes to determine what content that tag may contain. To accomplish this, we'll need two element rules.
<!-- costs are greater than or equal to zero --> <elementRule role="money-cost" label="money" type="decimal"> <minInclusive value="0"/> </elementRule> <!-- balances are unrestricted --> <elementRule role="money-balance" label="money" type="decimal"/>
In the previous section, we had element rules with the same role and different labels; here we have element rules with the same label and different roles. Now we define elements that refer to the different roles:
<tag name="money" role="money-cost"> <attribute name="unit" type="string"/> <attribute name="usage" type="string"> <enumeration value="cost"/> </attribute> </tag> <tag name="money" role="money-balance"> <attribute name="unit" type="string"/> <attribute name="usage" type="string"> <enumeration value="balance"/> </attribute> </tag>
Thus, the <money> tag plays the
money-cost role when the usage attribute is
cost; and it plays the money-balance role
when its usage is balance.
| Resources |
|
RELAX Home Page |
RELAX is a powerful markup language that permits you to specify how other XML documents are to be validated. You may, as with other specification methods
Additionally, RELAX gives you the power to
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.