Menu

Modeling XML Vocabularies with UML: Part III

October 10, 2001

Dave Carlson

This article is the third installment in a series on using UML to model XML vocabularies. The examples are based on a simple purchase order schema included in the W3C XML Schema Primer, and we've followed an incremental development approach to define and refine this vocabulary model with UML class diagrams. The first objective of this third article is to complete the process of refining the PO model so that the resulting schema is functionally equivalent to the one contained in the XSD Primer.

The second objective is to broaden our perspective for understanding how UML can contribute to the analysis and design of XML applications. This subject is of book-length complexity, one of which I've written one, another I'm planning, although more specific goals are possible if we keep our focus on the design of XML vocabularies. The following list summarizes several goals that guide our work.

Table of Contents

UML Profile for XML Schema

Customizing the PO Schema Design Model

Creating XML Schemas with hyperModel

Schema Modularity and Reuse

Tips for Success

  • Create a valid XML schema from any UML class structure model, as described in the first two parts of this series.
  • Refine the conceptual model to a design model specialized for XML schema by adding stereotypes and properties that are based on a customization profile for UML.
  • Support a bi-directional mapping between UML and XSD, including reverse engineering existing XML schemas into UML models.
  • Design and deploy XML vocabularies by assembling reusable modules.
  • Integrate XML and non-XML information models in UML; to represent, for example, both XML schemas and relational database schemas in a larger system.

Even this relatively narrow scope covers a broad terrain. The following introduction to a UML profile for XML adds a critical step toward all of these goals. These extensions to UML allow schema designers to satisfy specific architectural and deployment requirements, analogous to physical database design in a RDBMS. And these same extensions are necessary when reverse engineering existing schemas into UML because we must map arbitrary schema structures into an object-oriented model. But don't get hung up in the details of the UML profile; in most cases, a few well-placed stereotypes and properties will achieve your design objectives without distracting from the clarity of vision in your conceptual model.

UML Profile for XML Schema

Previous Articles in this Series

Modeling XML Vocabularies with UML: Part One

Modeling XML Vocabularies with UML: Part Two

UML provides a foundation for modeling structure and behavior of most software systems, but there are domain-specific situations that require additional model information to be captured by the analyst beyond what is possible with UML. This issue is solved through the use of UML extension profiles. A UML profile has three key items: stereotypes, tagged values (properties), and constraints. A profile provides a definition of these items and explains how they extend the UML in a particular domain, which is XML schema design in our case.

We had a brief encounter with UML stereotypes previously: <<XSDsimpleType>> was used to indicate that a class should be mapped to a user-defined datatype in the schema. A complete definition of the UML profile for XML Schema is beyond the scope of this article, but it's included in an appendix of my book and soon will be posted on XMLmodeling.com. Three stereotypes are introduced here along with a few tagged value properties. Their use will become clearer in the following section where we apply them to customize the purchase order model.

Each stereotype is assigned to one or more UML constructs that are modified by the profile extension. Each stereotype can be further specified by adding one or more properties that refine its meaning or impact on a model. For example, a stereotype assigned to a UML class extends the meaning of a "class" within the profile's domain and the stereotype's properties are added to the specification of that class in the model. Three stereotypes from the UML Profile for XML Schema are summarized as follows:

<<XSDcomplexType>> on a UML class

  • modelGroup (all | sequence | choice)
  • attributeMapping (element | attribute)
  • roleMapping (element | attribute)
  • elementNameMapping (upperCamelCase | lowerCamelCase | hypenLowerCase | omitElement )

<<XSDelement>> on a UML attribute or association end

  • position (integer value) within a sequence model group
  • anonymousType (true | false)
  • anonymousRole (true | false)

<<XSDattribute>> on a UML attribute or association end

  • use (prohibited | optional | required | fixed)

Other stereotypes can be used (e.g. <<XSDsimpleType>> on a UML class or <<XSDfacet>> on a UML attribute) to modify the meaning of those structures in the XML schema without specifying additional properties.

Many of the profile property values can be set as defaults for an entire UML model or for one package within that model. For example, the default modelGroup property can be set to 'sequence' for all classes within the model, instead of setting this property for each class individually. Similarly, the default attributeMapping can be set to 'element' so that all UML attributes are produced as XML elements in the schema, unless overridden for one or more individual classes. Individual UML attributes also can be assigned a stereotype that overrides the class or package default mapping.

Customizing the PO Schema Design Model

Consider the following sample XML document, which is a fragment from an example in the XSD Primer:

<ipo:purchaseOrder

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xmlns:ipo="http://www.example.com/IPO"

  orderDate="1999-12-01">



    <ipo:shipTo exportCode="1" xsi:type="ipo:UKAddress">

        <ipo:name>Helen Zoe</ipo:name>

        <ipo:street>47 Eden Street</ipo:street>

        <ipo:city>Cambridge</ipo:city>

        <ipo:postcode>CB1 1JR</ipo:postcode>

    </ipo:shipTo>

  . . .

</ipo:purchaseOrder>

We'll use this instance to derive requirements for refining the UML model. These design requirements are divided into four categories:

  1. Should the attributes of a UML class be produced as XML attributes or child elements in the schema?
  2. Which kind of model group (all, sequence, or choice) should be used to validate an element's content?
  3. Should we choose to include or exclude XML element tags that represent class names and roles in the UML associations?
  4. How do we map UML class names to XML element names?

The UML class diagram shown in Figure 1 includes profile extensions that resolve all of these design choices. This purchase order model should be very familiar by now. It was presented as a conceptual model of the vocabulary in two previous articles and is now refined to include stereotypes and properties that specify the XML schema design model. It's important to note that this is exactly the same structure as shown in previous diagrams, with a few additional labels added.


Figure 1: Design model of purchase order vocabulary

After applying these profile extensions, the following schema is produced for the PurchaseOrder class and its associations:


   <xs:element name="purchaseOrder" type="ipo:PurchaseOrder"/>

   <xs:complexType name="PurchaseOrder">

      <xs:sequence>

         <xs:element name="shipTo" type="ipo:Address"/>

         <xs:element name="billTo" type="ipo:Address"/>

         <xs:element name="comment" type="xs:string" minOccurs="0" maxOccurs="1"/>

         <xs:element name="items" minOccurs="0" maxOccurs="1">

            <xs:complexType>

               <xs:sequence>

                  <xs:element ref="ipo:item" minOccurs="0" maxOccurs="unbounded"/>

               </xs:sequence>

            </xs:complexType>

         </xs:element>

      </xs:sequence>

      <xs:attribute name="orderDate" type="xs:date"/>

   </xs:complexType>

If you want to work through these examples in more detail, the complete sample document is available, as is the PO module schema and the Address module schema.

The sample purchase order instance document includes two XML attributes: orderDate on the purchaseOrder element and exportCode on the shipTo element. By assigning an <<XSDattribute>> stereotype to the orderDate attribute in UML, we specify that it should be represented as an attribute in XML. The exportCode attribute is similarly stereotyped on the UKAddress class, although it's not shown here. The comment UML attribute in the PurchaseOrder class follows the default mapping to an element in the schema.

The XSD Primer uses a <sequence> model group for all complexType content, whereas the default UML mapping uses an <all> unordered content model. To modify this mapping we assign the <<XSDcomplexType>> stereotype to the PurchaseOrder class and set the modelGroup property to 'sequence'.

But the use of a sequence model group raises a new issue when mapping from UML to XML schemas. UML attributes and associations are inherently unordered within their owning class. So each UML attribute and association end that is part of a sequence group must be annotated with a profile property that specifies its position. These position property values are shown as annotations in Figure 1. The procedure for adding profile stereotypes and property values is different in each UML tool, although any tool that claims compliance with the UML specification must provide some means for adding them.

The default mapping rules allow an Address element (or one of its subclasses) contained within the association role elements for shipTo and billTo (see Part II of this series), whereas the required instance document omits the Address tag and embeds its element and attribute content directly within the role tag. To specify this design choice, the <<XSDelement>> stereotype is assigned to the association ends connected to the Address class and the anonymousType property is set to 'true'. The stereotype label is omitted from the diagram to minimize clutter, but the tagged value properties are listed within curly braces.

Because the items role on the association to the Item class is not specified as an anonymousType, its definition in the schema shown above retains the role's container element to hold elements for the related class. The document instance for purchase order items looks like

<ipo:purchaseOrder>

    <ipo:items>

        <ipo:item partNum="833-AA">

            <ipo:productName>Lapis necklace</ipo:productName>

            <ipo:quantity>1</ipo:quantity>

            <ipo:USPrice>99.95</ipo:USPrice>

            <ipo:comment>Want this for the holidays!</ipo:comment>

            <ipo:shipDate>1999-12-05</ipo:shipDate>

        </ipo:item>

    </ipo:items>

</ipo:purchaseOrder>

In this situation "anonymous type" has a slightly different, more general meaning than is used in the W3C XML Schema specification. It's easiest to understand the meaning I intend by looking at the UML class diagram in Figure 1 rather than at the XSD Schema document. In the class diagram, if an association end is marked as an anonymousType, then the name of the associated class is anonymous when its instances appear in XML documents, regardless of which schema language is actually used to define those documents. The concept of anonymous types is realized differently in different schema languages.

You may have noticed that the XML document elements for purchaseOrder and item appear with a lower-case first character; this is often called "lower camel case" format. However, the default mapping from UML creates these element names equal to the class names, which begin with upper-case letters. The "upper camel case" convention used in the UML diagram is commonly used in object-oriented models and languages, whereas a variety of conventions are followed in current XML schema vocabularies.

This issue is resolved by adding an elementNameMapping property to a UML class along with the <<XSDcomplexType>> stereotype. This profile property allows an XML schema designer to choose a preferred naming convention when modeling the schema details. Like many other profile properties, this value can be set as a default for the entire model so that all class names will be mapped to XML element names in the same way.

Creating XML Schemas with hyperModel

To support an iterative modeling process, we have developed a web application that creates XML schemas from UML models. A key enabling technology is the XML Metadata Interchange (XMI) specification from the OMG that defines a standard for serializing UML models as XML documents. Many UML tools now support this standard import-export format, and some use it as their native file format. hyperModel accepts any XMI 1.0 file containing a UML 1.3 model and transforms it into either HTML or several alternative XML schema languages.

I created the purchase order model in both Rational Rose (www.rational.com) and the open source ArgoUML tools. XMI files can be exported from Rational Rose by using the add-in developed by Unisys and available for download from Rational Software. ArgoUML uses XMI as its native file format. When the purchase order model from either tool is uploaded to hyperModel, the HTML view shown in Figure 2 is displayed in a Web browser.


Figure 2: An HTML view of UML in hyperModel

hyperModel uses XSLT stylesheets to transform the XMI files into other representations, including HTML and XSD schemas. It allows individual classes or entire packages to be transformed and displayed in a browser. When the output transformation is changed to XSD, then the schema representation of the selected class is displayed, as shown in Figure 3. Selecting a UML package, e.g. PO or Address, will display the complete schema for a module in our vocabulary design.


Figure 3: Transformation of a UML class to XML Schema

We have found this application to be a tremendous help in learning about and refining an object-oriented approach to the analysis and design of XML schemas. We also use this tool as an integral part of our training classes on modeling XML vocabularies. Even if you don't use UML as a primary design tool in your project, experimenting with schema models and alternative UML profile customizations is a productive way to understand XML schema structures and design guidelines.

Schema Modularity and Reuse

One of the benefits gained by using UML as part of our XML development process is that it enables a thoughtful approach to modular, maintainable, reusable application components. In the first two parts of this series, the PurchaseOrder and Address elements were specified in two separate diagrams, implying reusable submodels. UML includes package and namespace structures for making these modules explicit and also specifying dependencies between them.

Figure 4 illustrates a UML package diagram of this purchase order model. A package, shown as a file folder in a diagram, defines a separate namespace for all model elements within it, including additional subpackages. These UML packages are a very natural counterpart to XML namespaces. A dashed line arrow between two packages indicates that one is dependent on the other.

When used in a schema definition, each package produces a separate schema file. The implementation of dependencies varies among alternative schema languages. For DTDs they might become external entity references. For the W3C XML Schema, these package dependencies create either <include> or <import> elements, based on whether or not the target namespaces of related packages are equal. A dependency is shown from the PO package to the XSD_Datatypes package, but an import element is not created because this datatype library is inherently available as part of the XML Schema language.


Figure 4: UML package diagram showing schema integration

This object-oriented approach to XML schema design facilitates modular reuse, just as one would do when using languages such as Java or C++. A new vocabulary module could import our current Address package and define a new subclass of Address or further specialize USAddress with a new subclass. For example, BusinessUSAddress might be created with a new mailstop attribute. When transformed to XML Schema, this new subtype would automatically become available as valid content for the shipTo or billTo elements in a purchaseOrder. This is conceptually similar to the way one would create a new Java subclass within a new application-specific package; other libraries are reused by importing, and possibly extending, their classes.

Tips for Success

In order to help you when applying these ideas to your own e-business projects, I offer the following tips for success:

  1. Plan for reuse; shortsighted design leads to short-term use. Many of the same principles developed over the past decade for object-oriented and component-based reuse can be applied to XML applications. New web services standards, such as WSDL and WSFL, add the behavioral interface of reusable modules, complementing the structure defined by schema information models.
  2. Follow a consistent set of design guidelines. Whenever possible, set model-level default properties in the UML profile and avoid overriding the mappings for individual classes, attributes, and associations.
  3. Choose UML tools that provide complete support for the XMI model interchange standard. It often not practical or desirable to be locked into one design tool. In particular, code generation and reverse engineering tools can be built around the XMI document format that leverages the strength and flexibility of general-purpose XML tools.