XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

A Smoother Change to Version 2.0
by Marc de Graauw | Pages: 1, 2, 3

IgnoreUnknown and MustUnderstand Semantics

If two language versions, L1 and L2, are forward-compatible, we do not expect L1 to process all L2 syntax. Like HTML, we just expect the earlier application to accept documents in later versions of the language, and show what can be shown. This is what we call "IgnoreUnknown" semantics, and this is where "MustUnderstand" comes in. Some information simply may not be ignored. This is frequently the case with information related to security. SOAP provides a mechanism for SOAP headers to achieve this:

<my:security-header soap:mustUnderstand = "1">

If the mustUnderstand attribute is set to "1", an application may only process the message if it understands the semantics of this header. MustUnderstand overrides IgnoreUnknown.

IgnoreUnknown works well for browsers, but sometimes understanding is simply mandatory. Again, this is true for nearly everything related to security, and much of reliable messaging and transactioning as well. It is also often true in environments such as health care or finance: if you do not understand the information I sent you, I'd rather have you reject the message and call me than ignore dosage in the medical prescription I sent, or the maximum on the stock order I submitted. Some things need to be understood. SOAP mustUnderstand semantics are not very flexible, however: mustUnderstand works only for SOAP headers. It could be extended to cover elements in SOAP:Body as well, but this potentially adds an attribute to every element in the tree -- yuck! It also only works on the level of an entire element. There must be a better way.

The Capability Compatibility Design Pattern

One of the principles that follows from the discussion of compatibility is that a sender knows which language version was used to create a message, and the capabilities of that language and of earlier versions. So the sender can put this information in the message itself. Of course the language version L4 that was used to produce a message is suitable for understanding it, so any receiver that understands L4 may process it. The sender can also know whether this particular message uses any new items introduced in version L4. Maybe it uses only items already in the previous language version, L3. So the sender could indicate in the message that any L3 receiver may process it. Ditto for L2 and L1.

If the message does contain new L4 items, and those items can be safely ignored, the sender can also list L3 as sufficient for processing. If the message contains items from L4 that must be understood, the sender will list only the L4 capability as sufficient for processing the message. The receiver knows which version was used to build the receiving software, its capabilities, and the capabilities of earlier versions. So if the receiver is built using language version L5, it will know whether it can process L4 messages (it usually will -- but sometimes language changes will not be backward-compatible). If it can, L5 receivers will simply know they can safely process L4 messages. So if we put the version information into the message itself, the receiver can calculate whether it may process the message or not -- in the latter case, the receiver can return an error message.

L3 and L4 compatibility
Figure 3. L3 and L4 compatibility

This "Capability Compatibility Design Pattern" extends well beyond elements. Of course any attribute in a particular language version can be handled in exactly the same way. More than this, the Pattern easily handles element content as well. If we have an L4 code list with values "Standard" and "Handle with Care," and then L5 introduces a code "Unknown," it can be ignored, and L4 receivers may process it. If L5 contains a new code "Hazardous," this may not be ignored -- only L5 receivers may use such a message for subsequent transport of associated goods. In fact, the Capability Compatibility Design Pattern can handle any type of change in the language. And instead of requiring mustUnderstand attributes sprinkled throughout the entire document, a single list with a couple of language versions required for processing is sufficient.

Let's do a walkthrough of the Capability Compatibility Design Pattern. In each example a new type of version change is shown, as well as the way it is handled by the Capability Compatibility Design Pattern.

The Medication Example

We'll start with a language used by physician to send medication prescriptions to apothecaries. Here is version 1:

<?xml version="1.0" encoding="UTF-8"?>
    <message
        version="1">
    <require>
    <version>1</version>
        </require><prescription>
        <medication>aspirin</medication><amount>24</amount>
    </prescription>
</message>

We'll ignore all details such as patient IDs, namespaces, etc., and focus on the medication and the versioning information. In a <require> element we list the versions that may accept our message -- just version 1 for version 1 of the language. Normally, using a URI to identify the version would be the thing to do, but for brevity, I've used just integers in the examples. L1 processors will also need the capability to ignore unknown tags. I've supplied a XSLT script that transforms an Lx document to L1 by removing all unknown elements in the <prescription> element. There are other mechanisms -- using NVDL, authoring XML Schemas with wildcards, doing this in Java or C on the server -- but this will do here. The important thing is that any language that uses the Capability Compatibility Design Pattern must have a mechanism for ignoring unknown content. The processing model for the language is:

  1. Check the required versions.
  2. If not available, return an error.
  3. Strip unknown content with stylesheet.
  4. Validate against schema.
  5. Dispatch for further processing.

Here's a zip file with all sample XML, all "ignore unknown" stylesheets, and schemas for the examples in the article.

Pages: 1, 2, 3

Next Pagearrow