Managing Enumerations in W3C XML Schemas
Introduction
When working with data-oriented XML, there is often a requirement to handle "controlled vocabularies", otherwise known as enumerated values. Consider the following example of a bank account summary:
<accountSummary>
<timestamp>2003-01-01T12:25:00</timestamp>
<currency>USD</currency>
<balance>2703.35</balance>
<interest rounding="down">27.55</interest>
</accountSummary>
There are two controlled vocabularies in this document. One is the
currency, which is an
ISO-4217 3-letter currency code ("USD" is US Dollar). The
other is the rounding direction for the interest, which can be
"up", "down", or "nearest". The
bank in this example prefers to round the interest down.
The problem in designing this schema is that the ISO 3-letter currency codes are externally controlled. They can change at any time. If you embed them in your schema, you need to reissue the schema every time ISO makes a change, which can be expensive. This is especially true in enterprise situations where any schema change, no matter how small, can require full retesting of any applications that use the schema. This needs to be avoided whenever possible.
In this article, we will discuss how controlled vocabularies can be managed when using W3C XML Schemas, since this is the dominant XML schema format for data-oriented XML. Note that the "vocabularies" we refer to are enumerated lists of element-attribute values. This differs from other contexts where "vocabularies" are sets of XML element names.
Step 1: Monolithic Schema
Before worrying about which controlled vocabularies are out of our control, the first thing to do is create a schema, using W3C XML Schema, for the account summaries. For the purposes of this article, we will use just a subset of the ISO 3-letter currency codes. A suitable schema is
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
version = "1.0"
elementFormDefault = "qualified">
<xsd:element name = "accountSummary">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "timestamp"/>
<xsd:element ref = "currency"/>
<xsd:element ref = "balance"/>
<xsd:element ref = "interest"/>
</xsd:sequence>
<xsd:attribute name = "version" use = "required">
<xsd:simpleType>
<xsd:restriction base = "xsd:string">
<xsd:pattern value = "[1-9]+[0-9]*\.[0-9]+"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
<xsd:element name = "timestamp" type = "xsd:dateTime"/>
<xsd:element name = "currency" type = "iso3currency"/>
<xsd:element name = "balance" type = "xsd:decimal"/>
<xsd:element name = "interest">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base = "xsd:decimal">
<xsd:attribute name = "rounding" use = "required"
type = "roundingDirection"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name = "iso3currency">
<xsd:annotation>
<xsd:documentation>ISO-4217 3-letter currency codes,
as defined at
http://www.bsi-global.com/Technical+Information/Publications/_Publications/tig90.xalter
or available from
http://www.xe.com/iso4217.htm
Only a subset are defined here.</xsd:documentation>
</xsd:annotation>
<xsd:restriction base = "xsd:string">
<xsd:enumeration value = "AUD"/><!-- Australian Dollar -->
<xsd:enumeration value = "BRL"/><!-- Brazilian Real -->
<xsd:enumeration value = "CAD"/><!-- Canadian Dollar -->
<xsd:enumeration value = "CNY"/><!-- Chinese Yen -->
<xsd:enumeration value = "EUR"/><!-- Euro -->
<xsd:enumeration value = "GBP"/><!-- British Pound -->
<xsd:enumeration value = "INR"/><!-- Indian Rupee -->
<xsd:enumeration value = "JPY"/><!-- Japanese Yen -->
<xsd:enumeration value = "RUR"/><!-- Russian Rouble -->
<xsd:enumeration value = "USD"/><!-- US Dollar -->
<xsd:length value = "3"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name = "roundingDirection">
<xsd:annotation>
<xsd:documentation>Whether the interest is
rounded up, down or to the
nearest round value.</xsd:documentation>
</xsd:annotation>
<xsd:restriction base = "xsd:string">
<xsd:enumeration value = "up"/>
<xsd:enumeration value = "down"/>
<xsd:enumeration value = "nearest"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Notice the two controlled vocabularies (enumerations), the simple types
iso3currency and roundingDirection. For
iso3currency, the length of the enumeration strings is
explicitly set to 3, to help avoid stupid editing errors in future when
the list of currencies needs to be updated.
Note also that the schema's optional version attribute has
been set to "1.0". When working with data-oriented XML messages, it is
usually necessary to support multiple versions of the message schema
concurrently, as the systems that use the message schema will probably not
be able to upgrade to the latest version simultaneously. So, it is vital
to identify the schema version that an XML message was validated
against. In keeping with this, we will name our schemaq
accountSummary-1.0.xsd, so that future versions won't
overwrite the current version.
Further, a version attribute has been added to the
accountSummary element, so that message instances clearly
identify their schema version. It is assumed that the version numbers have
the form M.N where M is the major version number
and N is the minor version number. With this change, plus the
schema, the account summary now becomes
<accountSummary
version = "1.0"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation = "accountSummary-1.0.xsd">
<timestamp>2003-01-01T12:25:00</timestamp>
<currency>USD</currency>
<balance>2703.35</balance>
<interest rounding = "down">27.55</interest>
</accountSummary>
Step 2: Isolate Volatile Controlled Vocabularies
When dealing with controlled vocabularies (enumerations) in schemas, it is a good idea to rate the volatility of each vocabulary. A volatile vocabulary is one which is expected to change independently of the normal release cycle of schema versions. A stable vocabulary is one which is expected to change (if at all) only as new schema versions are released. Volatile vocabularies are a problem if embedded in a schema because they impose extra releases on all dependent applications.
In our example of an account summary, the currency codes are a volatile
vocabulary: they are externally controlled by ISO, and currencies can be
added or removed by ISO at any time. On the other hand, the set of the
rounding directions {"up", "down", "nearest"} is unlikely to
change, so it is a stable vocabulary. From the point of view of somebody
maintaining an application which deals with account summaries, adding a
new rounding direction would mean writing, testing, and deploying a new
version of the application. Political pressure would dictate that rounding
values would only ever change as part of the planned release cycle of the
schema. So it makes sense to leave the roundingDirection
simple type embedded in the schema.
Pages: 1, 2 |