Enforcing Association Cardinality
If you're like me, XML document design brings out your darker side. We like a schema whose heart is stone -- a schema that's just itching to call a candidate document on the carpet for the slightest nonconformity. None of this any-element-will-do namespace validation for us. We enjoy the dirty work: schemas, we think, are best built to be aggressive in ferreting out the little mistakes in the information set that could later confuse the more sheltered and constructive-minded XML application.
In this article, we'll practice a bit of that merciless science. Specifically, we'll look at ways to control the cardinality of associations between XML elements. The basic implementation, which we'll review momentarily, of an association between two types is simple enough but is only sufficient for many-to-one relationships. What if multiple references, or zero references, to an item are unacceptable? What if a referenced item may or may not be present? These variations will require other techniques, and these are essential for the truly draconian schema author.
This article will use a simple UML notation to illustrate patterns and examples. Knowledge of both XML Schema 1.0 (Part 1 in particular) and UML is assumed, although in developing our notation we'll have a chance to review a little of both.
A Simple UML Notation
The Unified Modeling Language (UML) provides a basis for a simple notation which will serve our needs in identifying rudimentary design patterns and in illustrating specific examples. (Note that many UML-to-Schema mappings are possible; see the "XMI Production for XML Schema" specification from the OMG for one much more formal option.)
First, let's say that a single complex type is what UML calls a
class, with XML attributes and simple-type child elements
represented by UML attributes -- we're also borrowing the XPath
@ shorthand to distinguish an XML attribute from a
simple-type child element.
<xs:complexType name="Part"> <xs:sequence> <xs:element name="Name" /> <xs:element name="Price" /> </xs:sequence> <xs:attribute name="PartID" /> <xs:attribute name="InStock" /> </xs:complexType>
Expressing Compositions: Content Models
The above example also represents the most basic sort of composition, in which all the components are of simple type. Composition of one complex type in terms of another in UML is most literally denoted as an aggregation by value; note the use of a name on the composing type and the depiction of cardinality:
<xs:complexType name="Dealership"> <xs:sequence> <xs:element name="InventoryItem" type="Part" minOccurs="0" maxOccurs="unbounded" /> <!-- Other content omitted. --> </xs:sequence> </xs:complexType>
One brief note on composition: it is so basic to the XML information model that it's easy to overuse. It's an excellent habit to challenge the assumption that composition is the best characterization of a given relationship, and with XML Schema, as we're about to see, there's no practical reason to avoid implementing a looser association where it's appropriate to the design.
Expressing Associations: Key Relationships
For types that are related other than by composition the UML association is a useful concept. It declares that between two types there is a clear relationship, which may be named on either side or on both, and usually that there is a clear understanding of cardinality as well. The association also may be navigable in either or both directions; to navigate the association is to find an associated instance based on knowledge of an associating instance.
Although association is an OO concept, its implementation in XML
Schema relies on a classic relational database feature, the
foreign-key relationship. A key
must be defined to allow unique instances of the associated
Salesman above) to be identified; and the
instances of the associating type (
Car) rely on
reference to point to referenced items. These two constraints are
very close in spirit to RDBMS primary and foreign keys,
<xs:element name="Dealership"> <xs:complexType> <xs:sequence> <!-- Referenced types Car and Salesman not shown. --> <xs:element ref="Car" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="Salesman" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:key name="SalesmanKey"> <xs:selector xpath="Salesman" /> <xs:field xpath="@SellerID" /> </xs:key> <xs:keyref name="CarToSalesman" refer="SalesmanKey"> <xs:selector xpath="Car" /> <xs:field xpath="@SoldBy" /> </xs:keyref> </xs:element>
The precise validation rules that are realized by these two components are as follows:
- Each element of type
Salesmanmust be found to be unique among its direct siblings (not in the whole document), as evaluated by the fields listed in the governing key.
- Each element of type
Carthat has the fields identified in the governing key reference (it might not, depending on composition cardinality!), the values in those fields must be found to be equal to exactly one corresponding set of values in the referenced key.
The basic implementation of an association, shown above, expresses
many-to-one cardinality: multiple A to singular B. Let's speak
generally now of associations "from A to B". By nature, the
xs:keyref simply assures that a B exists wherever an A
refers to it; it guarantees that there are no dangling references, so
to speak. Validation of a document against a key reference does not
make any checks as to the number of elements that might reference the
same item. The following sections lay out techniques for deviating
from the default association cardinality on each side of the
Controlling Association Cardinality: Optional or Multiple Associated Type
The cardinality of the associated type is the easier of the two:
this is really a matter of controlling the composition of
referencing fields. Think of a referencing field (such as
@SoldBy in the previous example) as a proxy for the
associated object. This proxy can be made a part of the associating
element by composition, and thus all the rules thereof apply exactly.
So, for singular cardinality on type B ("for each A there must be
exactly one B"), the referencing fields will be required attributes or
child elements; in fact in this case it may be the value of the
associating element itself. For optional cardinality ("for each A
there may be one B"), use an optional attribute or child element. For
multiple cardinality ("for each A there may be zero, one or many Bs"),
use child elements and an "unbounded" maximum occurrence
For example, consider a model a newspaper might use for printing movie listings:
Theatre makes reference to various
Movies it might be showing. The Movies schema shows how
to do this by combining an
xs:keyref with occurrence
constraints. The key reference is in no way special, and by itself
describes many-to-one cardinality, which is also desirable for the
Showing type. See also a valid instance
But, wait, where's the "multiple" B type? The diagram
above maps the XML schema construction exactly, and in doing so lays
Showing type as what in an RDB would be a
link table, and what can be represented in UML as a link
attribute. The alternate notation below uses the link attribute
to clarify the role of
Showing, and also states the
cardinality of the fundamental relationship between
Movie more accurately:
Controlling Association Cardinality: Optional Associating Type
On the associating side, we can implement optional cardinality by
insisting on the uniqueness of the referencing fields: in other
words, multiple A elements referencing the same B are disallowed. For
this purpose a third XML Schema construct can be used: the uniqueness
constraint, implemented using the
component. So we'll assert a key on B, a key reference on A, and an
additional uniqueness constraint on A, using the same fields found in
the key reference.
Let's say a restaurant keeps its reservation book as an XML document. It lists its tables, organizes each evening into planned seating times, and then captures each reservation as a name and number of people related to table and time:
Clearly, though each reservation references a table and seating
time, we don't want double-bookings. The Restaurant schema
enforces this constraint by laying an
xs:keyref, so that each combination of table
number and time must be unique. A valid instance document is Book.xml.
A more practical version of this design would include dates, rather than repeating all that table and seating information in multiple documents. There are a few approaches to this:
- We might make the date another attribute of
Reservation, and add it to the uniqueness constraint. See RestaurantWithDates.xsd and BookWithDates.xml.
- A more natural content model might call for an
Eveningelement composed of
Reservations. XML Schema identity constraints really shine here, because they can leverage the natural partitions created via element composition. The constraint can be placed at the appropriate scope, forbidding unique table-time pairs within a given
Eveningwhile allowing them within the broader document. See RestaurantByDates.xsd and BookByDates.xml. Note that the
xs:keyrefcomponents are unchanged, and that the
xs:uniquehas simply moved to the
Eveningscope and lost its
Controlling Association Cardinality: Singular Associating Type
To achieve singular cardinality on A is to insist that for every B there is some referencing A. This is the exact inverse of the key reference that's already in place, and the way to implement this cardinality is to create a symmetrical key and key reference. So, for a one-to-one association, there would be two keys and two key references, one each on types A and B.
Notice that this will work even when the cardinality of B is not singular. The idea of "exactly one A optionally referencing a B" may fit into the human brain only with some pushing and twisting, but it is both conceptually sound and technically feasible. This sort of relationship often will make more sense at every level if it is restated as "for every B there is exactly one A", and certainly this is simpler to implement. However, there may be tension between ease of implementation and the correct naming and navigability as designed: "inversion" to the implementer may be "perversion" to the designer. Keep in mind that the effort here is all in the schema; there is no burden on the instance documents, and if anything they may be most readable and intuitive according to the original design.
The following table summarizes the techniques for various multiplicity requirements for each side of the relationship:
|Multiplicity||Technique for A||Technique for B|
||Multiple child elements|
||Optional attribute or child element|
||Required attribute or child element, or the value of B itself|
Will Provost is an independent consultant in software architecture and design. He has been focused for the last few years on distributed systems using J2EE, CORBA, and XML. He is also the coordinator and primary author of Object Innovations' courseware on Java and XML topics.