Modeling XML Vocabularies with UML: Part II

September 19, 2001

In Part I of this series, I emphasized that models are an inevitable part of system analysis and design, even if a model is sometimes only in the developer's mind. By using UML to capture a conceptual model of the planned vocabulary, we are able to clarify the essential terms and relationships without getting caught up in the syntactic issues of the chosen schema language. In fact, industry standards groups may wish to use UML as the primary definition for their vocabularies and leave the final choice of schema language(s) to implementing vendors.

I also want to emphasize that choosing a model-driven approach to schema design does not force you into a long waterfall development process. The approach described in these articles illustrates an evolutionary and incremental development process. The first schema produced using default mapping rules from this purchase order model may not be ideal, but it accurately captures the domain semantics that were modeled. Part III of this series describes how the model may be specialized to capture design characteristics that are unique to XML schema generation. This approach is compatible with the contemporary methodologies for agile programming and modeling, where the models fulfill a very pragmatic role in the development process. (See XMLmodeling.com, a web portal that I have created to gather case studies and modeling resources.)

In order to achieve these rather lofty objectives, it's essential that we have a complete, flexible mapping specification between UML and XML schemas. The following examples do not present the complete picture but attempt to ease you into a maze of terminology from UML and the W3C XML Schema Definition Language (which I'll refer to hereafter as XSD).

Mapping UML Models to XML Schema

This is where the rubber meets the road when using UML in the development of XML schemas. A primary goal guiding the specification of this mapping is to allow sufficient flexibility to encompass most schema design requirements, while retaining a smooth transition from the conceptual vocabulary model to its detailed design and generation.

A related goal is to allow a valid XML schema to be automatically generated from any UML class diagram, even if the modeller has no familiarity with the XML schema syntax. Having this ability enables a rapid development process and supports reuse of the model vocabularies in several different deployment languages or environments because the core model is not overly specialized to XML.

Please note that the schema examples in this article are not fully compatible with the corresponding example in the XML Schema Primer. Nonetheless, the following schema fragments are still valid interpretations of the conceptual model. The third article in this series will continue the refinement process to its logical conclusion where the resulting schema can validate the XSD Primer example.

The conceptual model for purchase orders shown in Figure 1 is duplicated with very slight modification from the first article. We'll dissect this diagram into all of its major structures and map each part to the W3C XML Schema definition language. I'll note several situations where other alternatives are possible and also point out where the schema differs from the XSD Primer example.

Diagram.
Figure 1. Conceptual model of purchase order vocabulary.

Class and Attribute

A class in UML defines a complex data structure (and associated behavior) that maps by default to a complexType in XSD. As a first step, the PurchaseOrder class and its UML attributes produce the following XML Schema definition:

<xs:complexType name="PurchaseOrder">

  <xs:all>

    <xs:element name="orderDate" type="xs:date" 

                minOccurs="0" maxOccurs="1"/>

    <xs:element name="comment" type="xs:string" 

                minOccurs="0" maxOccurs="1"/>

  </xs:all>

</xs:complexType>

The attributes in a UML class are not restricted to a particular order, so an XSD <xs:all> element is used to create an unordered model group. In addition, a UML class creates a distinct namespace for its attribute names (i.e. two classes can contain attributes having the same name), so these are produced as local element definitions in the schema. See A New Kind of Namespace for more explanation of this topic. Both of these UML attributes are optional, indicated by [0..1] in Figure 1. These are mapped to minOccurs and maxOccurs attributes in the XSD. The UML attributes are defined using primitive data types from the XSD specification, so these are written directly to the generated schema using the appropriate namespace prefix. If other data types are used in the UML model, then an XSD type library can be created to define these types for use in a schema. For example, I have created an XSD type library for the Java primitive types and common Java classes such as Date, String, Boolean, etc.

As a useful default, a top-level element is automatically created for each complexType in the schema. The default name for this element is the same as the class name; this is allowed in W3C XML Schema because it uses separate namespaces within the schema itself for complexTypes and top-level elements. For PurchaseOrder, the top-level schema element is created as follows:

<xs:element name="PurchaseOrder" type="PurchaseOrder"/>

If you refer to the XSD Primer example, you'll see that orderDate is modeled as an XML attribute, not a local element in PurchaseOrder. It also uses a <sequence> model group instead of <all>. And, third, the top-level element is defined in the Primer using a lower-case first letter, i.e. purchaseOrder (often called "lower camel case" format). All of these differences are addressed in the third article by using a UML profile to expand the mapping to XML schemas.

Association

The PurchaseOrder type is specified not only by its UML attributes but also by its associations to other classes in the model. Figure 1 includes three associations that originate at PurchaseOrder, which is designated by navigation arrows at the opposite ends. Each association has a role name and multiplicity that specifies how the target class is related. These associations are added to the model group of the XSD complexType along with the elements created from the UML attributes.

<xs:complexType name="PurchaseOrder">

  <xs:all>

    <xs:element name="orderDate" type="xs:date" 

                minOccurs="0" maxOccurs="1"/>

    <xs:element name="comment" type="xs:string" 

                minOccurs="0" maxOccurs="1"/>

    <xs:element name="shipTo">

      <xs:complexType>

        <xs:sequence>

          <xs:element ref="Address"/>

        </xs:sequence>

      </xs:complexType>

    </xs:element>

    <xs:element name="billTo">

      <xs:complexType>

        <xs:sequence>

          <xs:element ref="Address"/>

        </xs:sequence>

      </xs:complexType>

    </xs:element>

    <xs:element name="items" minOccurs="0" maxOccurs="1">

      <xs:complexType>

        <xs:sequence>

          <xs:element ref="Item" 

                      minOccurs="0" maxOccurs="unbounded"/>

        </xs:sequence>

      </xs:complexType>

    </xs:element>

  </xs:all>

</xs:complexType>

Also in this Series

• Modeling XML Vocabularies with UML: Part One

• Modeling XML Vocabularies with UML: Part Three

Because the UML attributes for orderDate and comment have primitive data types, the schema embeds these values as element content. However, the default mapping for associations creates a wrapper element in XSD corresponding to the role name in UML. This element then contains the instances of the associated class, which the schema refers to using the top-level element created for each complexType.

If you want to create a W3C XML Schema using the <all> content model, then a wrapper element is necessary whenever the associated class has more than one occurrence. This is because <all> can be used only when the contained elements have either [0..1] or [1..1] multiplicity. So when generating the wrapper element for the association with Item, the element named item allows zero or one instances, which hold zero or more Item elements within it.

The difference between this default schema generated from UML and the schema included in the XSD Primer is that the Primer's shipTo and billTo roles contain the address content directly, without use of an element for the associated class. In other words, child elements for name, street, city, etc. are contained directly within shipTo and billTo. This design alternative is covered in the extensions presented in the third article.

User-Defined Datatype

The default mapping to XSD would produce a complexType definition for SKU and QuantityType, but we want these to become user-defined simple datatypes in the XML Schema. This is easily achieved by adding a UML stereotype to each of these two classes, which is shown as <<XSDsimpleType>> in Figure 1. This ability to include stereotypes is an integral part of the UML standard and is used to specify additional model characteristics that are usually unique to a particular domain; in this case, unique to XML schema design.

Using the stereotype, the schema generator knows to create the following definition for SKU:

<xs:simpleType name="SKU">

  <xs:annotation>

    <xs:documentation>Stock Keeping Unit, a code  

         for identifying products</xs:documentation>

  </xs:annotation>

  <xs:restriction base="xs:string">

    <xs:pattern value="\d{3}-[A-Z]{2}"/>

  </xs:restriction>

</xs:simpleType>

A UML model may also include documentation for any of its model elements, which is passed through to the XML schema definition as shown in this example. The UML generalization relationship indicates which existing simple datatype should be used as the base for this user-defined type. Finally, the pattern attribute on SKU is mapped to an XSD facet that constrains the SKU string value.

The second module in the purchase order schema definition represents a reusable set of specifications for addresses, as shown in Figure 2. These definitions are taken directly from section 4.1 of the XSD Primer. Two additional schema constructs are required by this model, in addition to those used when producing a schema from Figure 1.

Diagram.
Figure 2. Modularized Address schema component

Generalization

A fundamental and pervasive concept in object-oriented analysis and design is generalization from one class to another. The specialized subclass inherits attributes and associations from all of its parent classes. This is easily represented in W3C XML Schema, although it requires more indirect mechanisms when producing other XML schema languages.

In Figure 2, the Address class is shown in italic font, which is used in UML to indicate that this is an abstract class, only intended to be used for deriving other specialized classes. Following the same default rules used for PurchaseOrder, the complexType definitions for Address and USAddress are produced as follows:

<xs:element name="Address" type="Address" abstract="true"/>

<xs:complexType name="Address" abstract="true">

  <xs:all>

    <xs:element name="name" type="xs:string"/>

    <xs:element name="street" type="xs:string"/>

    <xs:element name="city" type="xs:string"/>

  </xs:all>

</xs:complexType>

   

<xs:element name="USAddress" type="USAddress" 

            substitutionGroup="Address"/>

<xs:complexType name="USAddress">

  <xs:complexContent>

    <xs:extension base="Address">

      <xs:all>

        <xs:element name="state" type="USState"/>

        <xs:element name="zip" type="xs:positiveInteger"/>

      </xs:all>

    </xs:extension>

  </xs:complexContent>

</xs:complexType>

There are three differences from previous examples. First, the top-level element and complexType definitions for Address include the XSD attribute abstract="true". Second, the USAddress element includes substitutionGroup="Address", which means that whenever the Address element is required as a content element, then USAddress may be substituted in its place. Thus, we may use USAddress (or, similarly, UKAddress) as the content of shipTo and billTo in the PurchaseOrder.

Third, the complexType definition for USAddress is extended from the base complexType named Address. There is, however, a significant point of difference in how this inheritance structure is interpreted in UML versus in XSD. In UML, the order of attributes and associations within a class is not specified and the features inherited from parent classes are freely intermingled with locally defined attributes and associations in a subclass. In XSD, inherited elements are treated as a group, so the three elements inherited from Address are an unordered group in USAddress, followed in sequence by another unordered group of the two elements defined in USAddress. You cannot define an unordered group of the five elements when one or more are inherited.

Enumerated Datatype

The state element of USAddress refers to a simple type definition for USState, which is generated from a UML enumeration. In Figure 2, USState is shown with an <<enumeration>> stereotype that notifies the schema generator to create an XSD enumeration value for each of the attributes defined for this class. An enumerated type in XSD is just a specialized kind of simpleType definitions, so it must also specify a superclass in UML to use as a base type in XSD. The schema is generated as follows:

<xs:simpleType name="USState">

  <xs:restriction base="xs:string">

    <xs:enumeration value="AK"/>

    <xs:enumeration value="AL"/>

    <xs:enumeration value="AR"/>

    <xs:enumeration value="PA"/>

  </xs:restriction>

</xs:simpleType>

Conclusions

The default mapping rules described in this article can be used to generate a complete XML schema from any UML class diagram. This might be a pre-existing application model that now must be deployed within an XML web services architecture, or it might be a new XML vocabulary model intended as a B2B data interchange standard. In either case, the default schema provides a usable first iteration that can be immediately used in an initial application deployment, although it may require refinement to meet other architectural and design requirements.

The first article in this series presented a process flow for schema design that emphasized the distinction between designing for data-oriented applications versus text-oriented applications. The default mapping rules are often sufficient for data-oriented applications. In fact, these defaults are aligned with the OMG's XML Metadata Interchange (XMI) version 2.0 specification for using XML as a model interchange format. This approach is also well aligned with the OMG's new initiative for Model Driven Architecture (MDA).

Text-oriented schemas, and any other schema that might be authored by humans and used as content for HTML portals, often must be refined to simplify the XML document structure. For example, many schema designers eliminate the wrapper elements corresponding to an association role name (but this also prevents use of the XSD <all> model group). This refinement and many others can be specified in a vocabulary model by setting a new default parameter for one UML package, which then applies to all of its contained classes.

We saw two examples of UML stereotypes in this article, which were used to indicate a specialized use of a UML class. More generally, these stereotypes and their associated property values are part of a UML profile for XML Schemas that I initially developed as part of my book on modeling XML applications. The third article in this series provides additional examples of using other stereotypes to customize the generated schema. I will also include description of a web-based tool we have developed that implements the complete UML profile for schema design and transforms any UML class model to either a W3C XML Schema or to an OASIS RELAX NG grammar.

Tips for Success

In order to help you when applying these ideas to your own e-business projects, I offer the following tips for success.

Plan for conceptual models of your business vocabularies that are reusable in several different deployment contexts, i.e. W3C XML Schema, DTD, relational DBMS, Java or EJB, etc. Alternative UML profiles can be used to transform the common business model to alternative platforms. But be aware that full realization of this goal is beyond the capabilities of many current UML tools.
Pre-existing UML models might be specialized to their deployment platform, platform libraries, and datatypes (Java, .NET, etc.). Isolate the platform independent domain model to enable its reuse and to generate XML schemas for data interchange.
Use consistent modeling guidelines for naming and structure, both within a single vocabulary and across a set of related models. For example, the FpML architecture specification provides clear guidelines for writing DTDs that are easily transferred to UML models, or any other object-oriented framework.