Managing Enumerations in W3C XML Schemas
by Anthony Coates
|
Pages: 1, 2
However, it is unlikely that an application would need to be recoded just to handle a change in the set of currency codes; if it did, that would bee a sign of an inflexible design. As the currency codes are externally controlled, they need to be isolated: we do that by creating a separate vocabulary schema for them. A vocabulary schema is one which contains a single simple type definition with enumerated values and nothing else. The vocabulary schema for the currencies is
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
version = "1.0"
elementFormDefault = "qualified">
<xsd:simpleType name = "iso3currency">
<xsd:annotation>
<xsd:documentation>ISO-4217 3-letter currency codes,
as defined at
http://www.bsi-global.com/Technical+Information/Publications/_Publications/tig90.xalter
or available from
http://www.xe.com/iso4217.htm
Only a subset are defined here.</xsd:documentation>
</xsd:annotation>
<xsd:restriction base = "xsd:string">
<xsd:enumeration value = "AUD"/><!-- Australian Dollar -->
<xsd:enumeration value = "BRL"/><!-- Brazilian Real -->
<xsd:enumeration value = "CAD"/><!-- Canadian Dollar -->
<xsd:enumeration value = "CNY"/><!-- Chinese Yen -->
<xsd:enumeration value = "EUR"/><!-- Euro -->
<xsd:enumeration value = "GBP"/><!-- British Pound -->
<xsd:enumeration value = "INR"/><!-- Indian Rupee -->
<xsd:enumeration value = "JPY"/><!-- Japanese Yen -->
<xsd:enumeration value = "RUR"/><!-- Russian Rouble -->
<xsd:enumeration value = "USD"/><!-- US Dollar -->
<xsd:length value = "3"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
and is named iso3currency-1.0.xsd. As you see, the
currency vocabulary now has its own version numbers and, thus,its own
release cycle. The vocabulary schema can now be included in the new
version (1.1) of the main message schema:
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
version = "1.1"
elementFormDefault = "qualified">
<xsd:include schemaLocation = "iso3currency-1.0.xsd"/>
<xsd:element name = "accountSummary">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref = "timestamp"/>
<xsd:element ref = "currency"/>
<xsd:element ref = "balance"/>
<xsd:element ref = "interest"/>
</xsd:sequence>
<xsd:attribute name = "version" use = "required">
<xsd:simpleType>
<xsd:restriction base = "xsd:string">
<xsd:pattern value = "[1-9]+[0-9]*\.[0-9]+"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
<xsd:element name = "timestamp" type = "xsd:dateTime"/>
<xsd:element name = "currency" type = "iso3currency"/>
<xsd:element name = "balance" type = "xsd:decimal"/>
<xsd:element name = "interest">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base = "xsd:decimal">
<xsd:attribute name = "rounding" use = "required" type = "roundingDirection"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name = "roundingDirection">
<xsd:annotation>
<xsd:documentation>Whether the interest is
rounded up, down or to the
nearest round value.</xsd:documentation>
</xsd:annotation>
<xsd:restriction base = "xsd:string">
<xsd:enumeration value = "up"/>
<xsd:enumeration value = "down"/>
<xsd:enumeration value = "nearest"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
and this is accountSummary-1.1.xsd according to our naming
scheme. Note that the currency codes no longer appear in the main
schema.
Step 3: Decouple Controlled Vocabularies
The problem with accountSummary-1.1.xsd is that it
directly imports iso3currency-1.0.xsd. When a new version of
the ISO currency vocabulary schema is released, you still
have to release a new version of the account summary schema. What is
needed is a mechanism to decouple the vocabulary schema versions from the
main schema versions. The simple solution is to use an unversioned
"pass-through" vocabulary schema:
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
elementFormDefault = "qualified">
<xsd:include schemaLocation = "iso3currency-1.0.xsd"/>
</xsd:schema>
This unversioned vocabulary schema has no version
attribute and is named iso3currency.xsd. To complete the
decoupling, a new version of the main schema,
accountSummary-1.2.xsd, is released. The only change from
version 1.1 is that the <xsd:include> changes from
<xsd:include schemaLocation = "iso3currency-1.0.xsd"/>
to
<xsd:include schemaLocation = "iso3currency.xsd"/>
so that the unversioned currency vocabulary schema is included. The
decoupling is now complete. If ISO changes the list of currency codes, a
new currency schema is released and iso3currency.xsd is
updated so that it imports the new currency schema. The main schema does
not need to be changed, since it includes iso3currency.xsd
and is agnostic to the version of the currency vocabulary schema.
Step 4: Protect Applications
Decoupling vocabulary schemas like this is not without issues. First, as new versions of the currency vocabulary schema are released, existing instance files will become invalid if they contain currency codes which ISO has deleted. In some situations that would be unacceptable, but it makes sense here. If an instance file refers to a currency code that no longer exists, then it has become semantically invalid; it is not unreasonable for it to become syntactically invalid too. The invalid syntax can then be used to detect such instances and route them for special processing, so that the code in the main application can focus on what to do with valid currency codes. Being able to remove error handling from the main application means the main application code remains smaller and easier to maintain.
Second, with the currency codes able to change at any time, there needs
to be synchronization between the currency codes in the currency
vocabulary schema and the currency codes known to the applications. There
are two solutions to this. The first is that applications can use the
vocabulary schema as the source of the currency codes. Treating the
vocabulary schema as an XML file, a quick SAX parse is all you need to
pull out the <xsd:enumeration> elements containing the
allowed values. The second solution is to keep the currency codes in a
central relational database. Applications can access this table directly,
while the vocabulary schema can be dynamically generated from the same
table. Either method keeps the set of allowed values synchronized across
applications.
Third, using such vocabulary schemas is only workable if applications can rely on them changing in one of two ways only: either an enumerated value is added or one is deleted.
Vocabulary schemas must never change structurally. If a new simple type, complex type, or element definition was added to a vocabulary schema, it could change the results of validating an instance with the main schema and cause a major application failure. So vocabulary schemas need to be "validated" to ensure that they contain just a single simple type definition with enumerated values. This is exactly the situation Will Provost described in "Working with a Metaschema".
An obvious solution would be to write a schema for vocabulary schemas as the metaschema. In practice I don't do this. The existing "Schema for Schemas" is known not to be 100% correct in describing the W3C XML Schema syntax, and so schema editing tools use it as an indicative, rather than normative guide. This means that schema editors tend to ignore any attempt to impose a metaschema on a schema. For this reason, and because the vocabulary schema format is quite simple, I use the following Schematron schema:
<sch:schema
xmlns = "http://www.w3.org/2001/XMLSchema"
xmlns:sch = "http://www.ascc.net/xml/schematron"
xmlns:xsd = "http://www.w3.org/2001/XMLSchema"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation =
"http://www.ascc.net/xml/schematron schematron-1.5.xsd">
<sch:title>Controlled vocabulary validation</sch:title>
<!-- The input is assumed to be a valid W3C XML Schema. -->
<!-- This just checks that it is also a valid -->
<!-- vocabulary Schema. -->
<sch:pattern name = "controlled-vocabulary-schema">
<sch:rule context = "schema">
<sch:assert test = "count(*) = count(simpleType[@name])"
>The schema must contain only a
single simple type definition.</sch:assert>
<sch:assert test = "count(simpleType[@name]) = 1"
>The schema must contain a single simpleType
definition or a single include.</sch:assert>
</sch:rule>
<sch:rule context = "simpleType">
<sch:assert test = "@name"
>The simpleType must have a name.</sch:assert>
<sch:assert test = "count(restriction) = 1"
>The simpleType must contain a
single restriction.</sch:assert>
<sch:assert test = "count(*) = count(annotation)+count(restriction)"
>The simpleType may have an annotation as well as its
restriction, but no other structure.</sch:assert>
</sch:rule>
<sch:rule context = "restriction">
<sch:assert test = "enumeration"
>A restriction must contain enumerated values.</sch:assert>
</sch:rule>
<sch:rule context = "enumeration">
<sch:key name = "enumerationsByValue" path = "@value"/>
<sch:assert test = "count(key('enumerationsByValue', @value)) = 1"
>An enumerated value must be unique.</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
Under Windows, you can run validate a vocabulary schema against this Schematron schema using the free validator from Topologi. For other platforms, see the list of tools in the Schematron Resource Directory. Chimezie Ogbuji introduced Schematron in "Validating XML with Schematron".
Schematron assertions are expressed using XPath expressions which must
evaluate to true. If they evaluate to false, a
Schematron validation error is generated. In our Schematron schema, note
the following:
Look at the rule for the
schemacontext. It contains the assertions that are applied to the<xsd:schema>element in the vocabulary schema. The first assertion checks that the only thing in the schema is<xsd:simpleType>definitions. The second assertion checks that there is only one<xsd:simpleType>definition.The rule for the
simpleTypecontext asserts that the<xsd:simpleType>must have anameattribute, that the<xsd:simpleType>may contain an<xsd:annotation>and must contain an<xsd:restriction>, but cannot contain any other elements.The rule for the
restrictioncontext asserts that the<xsd:restriction>must contain one or more enumerated values.The rule for the
enumerationcontext asserts that the enumeration values must be unique. This is checked using a Schematron key (equivalent to an XSLT key). The expressionkey('enumerationsByValue', @value)returns a list of the<xsd:enumeration>elements with the same value as the element being validated. If the values are unique, there will always be just one<xsd:enumeration>element in the list, the one being validated.
Conclusion
WXS schemas can be made more manageable by separating volatile controlled vocabularies (enumerations) into their own vocabulary schemas. In this article, we have seen how to identify volatile controlled vocabularies, how to separate them from the main schema, how to decouple the versions, and how to validate vocabulary schemas. There is no absolute rule for when a controlled vocabulary should have its own schema. Use the guidelines here, but always use your own judgment and your knowledge of your problem domain.
Resources
- The example files from this article are available as a ZIP archive (9K).
