XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Normalizing XML, Part 2
by Will Provost | Pages: 1, 2, 3

Scope of Uniqueness

Another important difference between relational database schemas and WXS concerns the scope of key uniqueness. A primary key in a relational database must be unique within its database instance. By contrast, a WXS key is defined for some element to govern uniqueness of key values within an instance of that element. Thus it is simple enough to assert uniqueness for an element or attribute at some scope smaller than the instance document. This allows "global" uniqueness to be enforced through progressively smaller scopes, such that the global identifier for a datum is a path consisting of several tokens, rather than a single value.

An airline, for example, might record its staffing schedule in a hierarchy from Airline to Flight to Date to Position. The last element would include a position name and the name of the employee filling that position. (Yes, there'd probably be an employee key, instead, but we have to stop somewhere.)

If we want to assert uniqueness over position name, we'd have to do so only for a certain flight on a certain date; that is, while we can't have two captains on the plane, we certainly need one for each plane that leaves the ground. So the "path" to a particular staffing fact would be expressed as //airline/flight/date/position/Employee. If this looks a lot like XPath, no wonder. This path-based addressing fits XML's hierarchical structure, and the ability to define WXS keys at subdocument scopes supports paths and relieves the document designer from the need to attach a global ID -- which seldom has any domain relevance -- to every datum, as is common practice in relational database design.

There is a downside to this facility, however. The trick is that a keyref can't be defined to traverse multiple scopes. It can't reference multiple keys, only one key through its refer attribute. So a keyref must work at the same scope as the referenced key in order to be effective. This poses some problems when defining associations.

Consider a simple workflow model, in which an Actor defines available inputs and outputs by name and type, and a Flow defines connections from Actor to Actor, specifying the wiring from source outputs to destination inputs. Not shown in the UML below is the encompassing element Process, which collects Flow and Actor instances to define some abstract workflow.

Diagram.

A Flow references two Actors; for each of these references a keyref is defined, as shown in this fragment of the total schema Workflow1.xsd:

<element name="process" type="work:Process" >
  <key name="ActorKey" >
    <selector xpath="./work:actor" />
    <field xpath="work:name" />
  </key>
  <key name="FlowKey" >
    <selector xpath="./work:flow" />
    <field xpath="work:sourceActor" />
    <field xpath="work:destinationActor" />
  </key>
  <keyref name="FlowSource" refer="work:ActorKey" >
    <selector xpath="./work:flow/work:sourceActor" />
    <field xpath="." />
  </keyref>
  <keyref name="FlowDestination" refer="work:ActorKey" >
    <selector xpath="./work:flow/work:destinationActor" />
    <field xpath="." />
  </keyref>
</element>

We encounter a problem at the next level of the hierarchy. How can we assert that a Connection references two Endpoints? Endpoint instances must be named uniquely, but only within each Actor instance, as shown above. If we try to reference this key from a parent scope (such as Process) or a sibling scope (Flow), there's no way to express that we want to reference an Endpoint by name within a particular Actor. (We might hope that the parser would be smart enough to narrow the scope automatically to the Actor instance referenced by the parent Flow, but this is asking too much, and certainly isn't supported in the WXS specification.) Owing to this limitation, the schema does not assert any association from Connection to Endpoint; if the input or output names in RequestMedicalProcedure1.xml were not accurate, validation would not catch the problem.

Possible workarounds include:

  • Breaking compositions in the referenced structure into associations, making the corresponding keys global in scope and thus easy to reference. For instance, Endpoints could be defined outside of, and referenced by, Actors. This gains a possibly-valid reference (e.g. from Connection to Endpoint) but loses the aforementioned advantage of composition.

    Screen shot.
  • A global ID could be defined for each referenced element. This preserves composition while allowing global key reference. This is the approach taken in Workflow2.xsd; note the new ID attribute, which must be managed manually or by the application or some authoring tool.

  • Leave the WXS domain to enforce and to navigate the desired association. For instance, this validating transform could enforce the missing workflow constraint. Application logic would also have to be written to help navigate from a Connection to the corresponding Endpoints, probably as DOM nodes.



Pages: 1, 2, 3

Next Pagearrow