XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


A Theory of Compatible Versions

December 20, 2006

Making versioning work in practice is a difficult problem in computing. Arguably, the Web was able to increase dramatically in popularity because evolution and versioning were built into HTML and HTTP. Both systems provide explicit extensibility points and rules for understanding extensions that enable their decentralized extension and versioning. This article describes a set-based model for explicit extensibility and understanding extensions that maximize the versioning capabilities of any language, including languages defined by XML Schema or other XML vocabulary formalisms. Using some simple set theory, we will show that providing extensibility in the first version of a language is the key to compatible evolution.

We will start with an example to illustrate. There is a Name language that specifies that a name contains a first and last, in order. The first and last must contain alpha characters only. We call instances or occurrences of the languages "Texts." Any Texts that are valid according to the Name Language are in the Defined Text Set.

We can model the Defined Text Set graphically as:

Figure 1
Figure 1.

In our example, we'll do the necessary thing and plan for extensibility. The name can be extended, so the Name Language defines that it will accept anything after the last. This content model is first, last, *. For example, first, last, and middle are accepted. All Texts in the Defined Text Set and more are accepted with this extensibility. The set of texts that are accepted in the language is called the Accept Text Set. By necessity, the Accept Text Set is larger (>) than the Defined Text Set--in set theory, a superset. Graphically, this is:

Figure 2
Figure 2.

There is a critical distinction between the items that are in the Defined Text Set (names with only first and last) and the items that are in the Accept Text Set. There are many items that are not in the Defined Text Set and are in the Accept Text Set.

We call the gap between the Defined Text Set and Accept Text Sets the extensibility gap. The extensibility gap is crucial for versioning because the gap is the place that future versions of the language will fill with subsequent definitions. The gap between the Accept Text Set and the Defined Text Set, achieved via extensibility, is the key to allowing forward and backward compatibility. This is because the Accept Text Set of Version 1 allows texts that are not in Version 1 Defined Text Set but may be in Version > 1 Defined Text Set.

To achieve compatibility, the extensibility point must be coupled with a processing model that converts (or maps or substitutes or transforms, etc.) a Text in the Accept Text Set into a Text that is in the Defined Text Set. We call this process a substitution rule.

One substitution rule is the "Must Ignore Unknowns" rule. It specifies that anything unknown but allowed is ignored. By ignoring the extension, the text is in the V1 Defined Text Set. It is the unknown extension that has made the text into something that is not in the Defined Text Set, so ignoring it moves the text into the Defined Text Set. The substitution looks something like:

Figure 3
Figure 3.

If the processing model for extensions were something like "catch fire and die if unknown" (obviously not a substitution rule), then there is no possible way of introducing an extension without also revising the Defined Text Set, and thus no forward compatibility.

There is also some subtlety in the exact nature of the extensibility point. It could allow anything at all, such as: first, last, first or even first, last, last. Or it could allow only things that aren't already defined, so first, last, first would be disallowed. We differentiate these types by calling them "inclusive" and "exclusive" extensibility, but we will put this distinction aside.

Pages: 1, 2

Next Pagearrow