XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Managing Enumerations in W3C XML Schemas

February 05, 2003

Introduction

When working with data-oriented XML, there is often a requirement to handle "controlled vocabularies", otherwise known as enumerated values. Consider the following example of a bank account summary:

<accountSummary>
  <timestamp>2003-01-01T12:25:00</timestamp>
  <currency>USD</currency>
  <balance>2703.35</balance>
  <interest rounding="down">27.55</interest>
</accountSummary>

There are two controlled vocabularies in this document. One is the currency, which is an ISO-4217 3-letter currency code ("USD" is US Dollar). The other is the rounding direction for the interest, which can be "up", "down", or "nearest". The bank in this example prefers to round the interest down.

The problem in designing this schema is that the ISO 3-letter currency codes are externally controlled. They can change at any time. If you embed them in your schema, you need to reissue the schema every time ISO makes a change, which can be expensive. This is especially true in enterprise situations where any schema change, no matter how small, can require full retesting of any applications that use the schema. This needs to be avoided whenever possible.

In this article, we will discuss how controlled vocabularies can be managed when using W3C XML Schemas, since this is the dominant XML schema format for data-oriented XML. Note that the "vocabularies" we refer to are enumerated lists of element-attribute values. This differs from other contexts where "vocabularies" are sets of XML element names.

Step 1: Monolithic Schema

Before worrying about which controlled vocabularies are out of our control, the first thing to do is create a schema, using W3C XML Schema, for the account summaries. For the purposes of this article, we will use just a subset of the ISO 3-letter currency codes. A suitable schema is

<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
  version = "1.0"
  elementFormDefault = "qualified">

  <xsd:element name = "accountSummary">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref = "timestamp"/>
        <xsd:element ref = "currency"/>
        <xsd:element ref = "balance"/>
        <xsd:element ref = "interest"/>
      </xsd:sequence>
      <xsd:attribute name = "version" use = "required">
        <xsd:simpleType>
          <xsd:restriction base = "xsd:string">
            <xsd:pattern value = "[1-9]+[0-9]*\.[0-9]+"/>
          </xsd:restriction>
        </xsd:simpleType>
      </xsd:attribute>
    </xsd:complexType>
  </xsd:element>

  <xsd:element name = "timestamp" type = "xsd:dateTime"/>

  <xsd:element name = "currency" type = "iso3currency"/>

  <xsd:element name = "balance" type = "xsd:decimal"/>

  <xsd:element name = "interest">
    <xsd:complexType>
      <xsd:simpleContent>
        <xsd:extension base = "xsd:decimal">
          <xsd:attribute name = "rounding" use = "required"
                         type = "roundingDirection"/>
        </xsd:extension>
      </xsd:simpleContent>
    </xsd:complexType>
  </xsd:element>

  <xsd:simpleType name = "iso3currency">
    <xsd:annotation>
      <xsd:documentation>ISO-4217 3-letter currency codes,
as defined at
http://www.bsi-global.com/Technical+Information/Publications/_Publications/tig90.xalter
or available from
http://www.xe.com/iso4217.htm
Only a subset are defined here.</xsd:documentation>
    </xsd:annotation>
    <xsd:restriction base = "xsd:string">
      <xsd:enumeration value = "AUD"/><!-- Australian Dollar -->
      <xsd:enumeration value = "BRL"/><!-- Brazilian Real -->
      <xsd:enumeration value = "CAD"/><!-- Canadian Dollar -->
      <xsd:enumeration value = "CNY"/><!-- Chinese Yen -->
      <xsd:enumeration value = "EUR"/><!-- Euro -->
      <xsd:enumeration value = "GBP"/><!-- British Pound -->
      <xsd:enumeration value = "INR"/><!-- Indian Rupee -->
      <xsd:enumeration value = "JPY"/><!-- Japanese Yen -->
      <xsd:enumeration value = "RUR"/><!-- Russian Rouble -->
      <xsd:enumeration value = "USD"/><!-- US Dollar -->
      <xsd:length value = "3"/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name = "roundingDirection">
    <xsd:annotation>
      <xsd:documentation>Whether the interest is
rounded up, down or to the
nearest round value.</xsd:documentation>
    </xsd:annotation>
    <xsd:restriction base = "xsd:string">
      <xsd:enumeration value = "up"/>
      <xsd:enumeration value = "down"/>
      <xsd:enumeration value = "nearest"/>
    </xsd:restriction>
  </xsd:simpleType>

</xsd:schema>

Notice the two controlled vocabularies (enumerations), the simple types iso3currency and roundingDirection. For iso3currency, the length of the enumeration strings is explicitly set to 3, to help avoid stupid editing errors in future when the list of currencies needs to be updated.

Note also that the schema's optional version attribute has been set to "1.0". When working with data-oriented XML messages, it is usually necessary to support multiple versions of the message schema concurrently, as the systems that use the message schema will probably not be able to upgrade to the latest version simultaneously. So, it is vital to identify the schema version that an XML message was validated against. In keeping with this, we will name our schemaq accountSummary-1.0.xsd, so that future versions won't overwrite the current version.

Further, a version attribute has been added to the accountSummary element, so that message instances clearly identify their schema version. It is assumed that the version numbers have the form M.N where M is the major version number and N is the minor version number. With this change, plus the schema, the account summary now becomes

<accountSummary
  version = "1.0"
  xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation = "accountSummary-1.0.xsd">
  <timestamp>2003-01-01T12:25:00</timestamp>
  <currency>USD</currency>
  <balance>2703.35</balance>
  <interest rounding = "down">27.55</interest>
</accountSummary>

Step 2: Isolate Volatile Controlled Vocabularies

When dealing with controlled vocabularies (enumerations) in schemas, it is a good idea to rate the volatility of each vocabulary. A volatile vocabulary is one which is expected to change independently of the normal release cycle of schema versions. A stable vocabulary is one which is expected to change (if at all) only as new schema versions are released. Volatile vocabularies are a problem if embedded in a schema because they impose extra releases on all dependent applications.

In our example of an account summary, the currency codes are a volatile vocabulary: they are externally controlled by ISO, and currencies can be added or removed by ISO at any time. On the other hand, the set of the rounding directions {"up", "down", "nearest"} is unlikely to change, so it is a stable vocabulary. From the point of view of somebody maintaining an application which deals with account summaries, adding a new rounding direction would mean writing, testing, and deploying a new version of the application. Political pressure would dictate that rounding values would only ever change as part of the planned release cycle of the schema. So it makes sense to leave the roundingDirection simple type embedded in the schema.

Pages: 1, 2

Next Pagearrow