Learning to RELAX
|Table of Contents|
In this article, we'll explore some of the more advanced features of the RELAX schema language by using it to create a schema for the XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, I've chosen it because it's simple, quite widely used, looks a great deal like HTML, and its RELAX specification will use most of the features we want to focus on.
A RELAX document is enclosed within a
element. The opening tag looks like
<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
The module tag is followed by the
element which specifies the root element of the document
that this RELAX schema is intended to validate. In the
case of a news story, this is
<interface> <export label="nitf"/> </interface>
If you have two types of documents that are the same except for their root element, you can have one RELAX document that will validate either kind of document.
<interface> <export label="account-receivable"/> <export label="account-payable"/> </interface>
After the interface is specified, you specify the types of elements to be validated.
Every element in the target markup language is described by an
<elementRule>, which describes its content, and a
role in the document's structure and has a
label to which other elements can refer.
An empty element, such as
<br/>, is specified thus:
<elementRule role="br" label="br"> <empty/> </elementRule> <tag name="br"/>
If the label is omitted, it is presumed to be the same
Sub-Elements and Element Types
A news story has a
<byline>, which in turn includes
<bytag> element that declares who wrote the story.
<bytag> element consists of string data.
RELAX specifies this as follows.
<elementRule role="byline"> <ref label="bytag"/> </elementRule> <tag name="byline"/> <elementRule role="bytag" type="string"/> <tag name="bytag"/>
type attribute specifies an element's datatype. The valid values
for a datatype include those taken from the XML Document Type
Definition, such as
ID, as well
as those introduced by XML
Schema Part 2: Datatypes, such as
As in HTML documents, an
<nitf> news story element
elements respectively. In a news story, they are both required.
<elementRule role="nitf" label="nitf"> <sequence> <ref label="head"/> <ref label="body"/> </sequence> </elementRule>
On the other hand, some elements may contain sub-elements in any
order. News stories, like HTML, can have tables, and their table rows
<tr>) can contain cells (either
<th>) in any order:
<elementRule role="tr"> <choice occurs="*"> <ref label="td"/> <ref label="th"/> </choice> </elementRule>
occurs attribute can be used with a
<ref> tag. It has three possible values.
|*||occurs zero or more times|
|+||occurs one or more times|
|?||occurs zero or one times|
You write a news story description list (
containin an optional list header, followed by one or more optional
description titles, and required descriptive data entries thus:
<elementRule role="dl"> <sequence> <ref label="lh" occurs="?"/> <sequence occurs="+"> <ref label="dt" occurs="?"/> <ref label="dd"/> </sequence> </sequence> </elementRule>
Some information requires text that isn't between tags. For example,
a news story
<location> can look like
The movie was filmed in <location> <city>Pekin</city>, a small city in <state>Illinois</state> </location>.
The bold red text above is inside the
but it isn't part of any sub-element.
<location> a mixed element, specified
(in part) like
<elementRule role="location"> <mixed> <choice occurs="*"> <ref label="city"/> <ref label="state"/> <ref label="region"/> <!-- etc. --> </choice> </mixed> </elementRule>
If you look at the specification for a news story, you'll find that
a paragraph (
<p>) is a mixed element that can
contain, among others,
<location>, and quoted
Likewise a quoted phrase can contain exactly the same sub-elements. Rather than specify the common sub-elements twice, RELAX allows you to specify a hedge rule:
<hedgeRule label="common.elements"> <choice> <ref label="br"/> <ref label="person"/> <ref label="location"/> <ref label="q"/> </choice> </hedgeRule>
Once a hedge rule is established, other
specifications can refer to it by using the <
<elementRule role="p"> <mixed> <hedgeRef label="common.elements" occurs="*"/> </mixed> </elementRule> <elementRule role="q"> <mixed> <hedgeRef label="common.elements" occurs="*"/> </mixed> </elementRule>
Note that the
<hedgeRule> can only describe elements;
it cannot be
<mixed>. That's why
<elementRule> has its own
<mixed> tag in the example above.
You may have noticed that the specification says that a quoted phrase can contain another quoted phrase. While this may be unusual in a news story, it's technically possible, and RELAX is not bothered by this at all.
Of course, XML does not consist of elements alone, as we'll see in the next section.
Pages: 1, 2