Modeling XML Vocabularies with UML: Part III
October 10, 2001
This article is the third installment in a series on using UML to model XML vocabularies. The examples are based on a simple purchase order schema included in the W3C XML Schema Primer, and we've followed an incremental development approach to define and refine this vocabulary model with UML class diagrams. The first objective of this third article is to complete the process of refining the PO model so that the resulting schema is functionally equivalent to the one contained in the XSD Primer.
The second objective is to broaden our perspective for understanding how UML can contribute to the analysis and design of XML applications. This subject is of book-length complexity, one of which I've written one, another I'm planning, although more specific goals are possible if we keep our focus on the design of XML vocabularies. The following list summarizes several goals that guide our work.
Table of Contents
- Create a valid XML schema from any UML class structure model, as described in the first two parts of this series.
- Refine the conceptual model to a design model specialized for XML schema by adding stereotypes and properties that are based on a customization profile for UML.
- Support a bi-directional mapping between UML and XSD, including reverse engineering existing XML schemas into UML models.
- Design and deploy XML vocabularies by assembling reusable modules.
- Integrate XML and non-XML information models in UML; to represent, for example, both XML schemas and relational database schemas in a larger system.
Even this relatively narrow scope covers a broad terrain. The following introduction to a UML profile for XML adds a critical step toward all of these goals. These extensions to UML allow schema designers to satisfy specific architectural and deployment requirements, analogous to physical database design in a RDBMS. And these same extensions are necessary when reverse engineering existing schemas into UML because we must map arbitrary schema structures into an object-oriented model. But don't get hung up in the details of the UML profile; in most cases, a few well-placed stereotypes and properties will achieve your design objectives without distracting from the clarity of vision in your conceptual model.
Previous Articles in this Series
UML provides a foundation for modeling structure and behavior of most software systems, but there are domain-specific situations that require additional model information to be captured by the analyst beyond what is possible with UML. This issue is solved through the use of UML extension profiles. A UML profile has three key items: stereotypes, tagged values (properties), and constraints. A profile provides a definition of these items and explains how they extend the UML in a particular domain, which is XML schema design in our case.
We had a brief encounter with UML stereotypes previously:
<<XSDsimpleType>> was used to indicate that a class should be
mapped to a user-defined datatype in the schema. A complete definition of the UML
for XML Schema is beyond the scope of this article, but it's included in an appendix
book and soon will be posted on XMLmodeling.com. Three stereotypes are introduced
with a few tagged value properties. Their use will become clearer in the following
where we apply them to customize the purchase order model.
Each stereotype is assigned to one or more UML constructs that are modified by the profile extension. Each stereotype can be further specified by adding one or more properties that refine its meaning or impact on a model. For example, a stereotype assigned to a UML class extends the meaning of a "class" within the profile's domain and the stereotype's properties are added to the specification of that class in the model. Three stereotypes from the UML Profile for XML Schema are summarized as follows:
<<XSDcomplexType>> on a UML class
- modelGroup (all | sequence | choice)
- attributeMapping (element | attribute)
- roleMapping (element | attribute)
- elementNameMapping (upperCamelCase | lowerCamelCase | hypenLowerCase | omitElement )
<<XSDelement>> on a UML attribute or association end
- position (integer value) within a sequence model group
- anonymousType (true | false)
- anonymousRole (true | false)
<<XSDattribute>> on a UML attribute or association end
- use (prohibited | optional | required | fixed)
Other stereotypes can be used (e.g.
<<XSDsimpleType>> on a UML
<<XSDfacet>> on a UML attribute) to modify the meaning of
those structures in the XML schema without specifying additional properties.
Many of the profile property values can be set as defaults for an entire UML model
one package within that model. For example, the default
modelGroup property can
be set to 'sequence' for all classes within the model, instead of setting this property
each class individually. Similarly, the default
attributeMapping can be set to
'element' so that all UML attributes are produced as XML elements in the schema, unless
overridden for one or more individual classes. Individual UML attributes also can
assigned a stereotype that overrides the class or package default mapping.
Consider the following sample XML document, which is a fragment from an example in the XSD Primer:
<ipo:purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ipo="http://www.example.com/IPO" orderDate="1999-12-01"> <ipo:shipTo exportCode="1" xsi:type="ipo:UKAddress"> <ipo:name>Helen Zoe</ipo:name> <ipo:street>47 Eden Street</ipo:street> <ipo:city>Cambridge</ipo:city> <ipo:postcode>CB1 1JR</ipo:postcode> </ipo:shipTo> . . . </ipo:purchaseOrder>
We'll use this instance to derive requirements for refining the UML model. These design requirements are divided into four categories:
- Should the attributes of a UML class be produced as XML attributes or child elements in the schema?
- Which kind of model group (all, sequence, or choice) should be used to validate an element's content?
- Should we choose to include or exclude XML element tags that represent class names and roles in the UML associations?
- How do we map UML class names to XML element names?
The UML class diagram shown in Figure 1 includes profile extensions that resolve all of these design choices. This purchase order model should be very familiar by now. It was presented as a conceptual model of the vocabulary in two previous articles and is now refined to include stereotypes and properties that specify the XML schema design model. It's important to note that this is exactly the same structure as shown in previous diagrams, with a few additional labels added.
Figure 1: Design model of purchase order vocabulary
After applying these profile extensions, the following schema is produced for the PurchaseOrder class and its associations:
<xs:element name="purchaseOrder" type="ipo:PurchaseOrder"/> <xs:complexType name="PurchaseOrder"> <xs:sequence> <xs:element name="shipTo" type="ipo:Address"/> <xs:element name="billTo" type="ipo:Address"/> <xs:element name="comment" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="items" minOccurs="0" maxOccurs="1"> <xs:complexType> <xs:sequence> <xs:element ref="ipo:item" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderDate" type="xs:date"/> </xs:complexType>
The sample purchase order instance document includes two XML attributes:
orderDate on the
purchaseOrder element and
exportCode on the
shipTo element. By assigning an
<<XSDattribute>> stereotype to the
orderDate attribute in UML, we
specify that it should be represented as an attribute in XML. The
attribute is similarly stereotyped on the UKAddress class, although it's not shown
comment UML attribute in the PurchaseOrder class follows the default mapping
to an element in the schema.
The XSD Primer uses a <sequence> model group for all complexType content, whereas
default UML mapping uses an <all> unordered content model. To modify this mapping
assign the <<XSDcomplexType>> stereotype to the PurchaseOrder class and set the
modelGroup property to 'sequence'.
But the use of a sequence model group raises a new issue when mapping from UML to XML schemas. UML attributes and associations are inherently unordered within their owning class. So each UML attribute and association end that is part of a sequence group must be annotated with a profile property that specifies its position. These position property values are shown as annotations in Figure 1. The procedure for adding profile stereotypes and property values is different in each UML tool, although any tool that claims compliance with the UML specification must provide some means for adding them.
The default mapping rules allow an
Address element (or one of its subclasses)
contained within the association role elements for
billTo (see Part II of this series),
whereas the required instance document omits the
Address tag and embeds its
element and attribute content directly within the role tag. To specify this design
<<XSDelement>> stereotype is assigned to the association ends
connected to the Address class and the
anonymousType property is set to 'true'.
The stereotype label is omitted from the diagram to minimize clutter, but the tagged
properties are listed within curly braces.
items role on the association to the
Item class is
not specified as an
anonymousType, its definition in the schema shown
above retains the role's container element to hold elements for the related class.
document instance for purchase order items looks like
<ipo:purchaseOrder> <ipo:items> <ipo:item partNum="833-AA"> <ipo:productName>Lapis necklace</ipo:productName> <ipo:quantity>1</ipo:quantity> <ipo:USPrice>99.95</ipo:USPrice> <ipo:comment>Want this for the holidays!</ipo:comment> <ipo:shipDate>1999-12-05</ipo:shipDate> </ipo:item> </ipo:items> </ipo:purchaseOrder>
In this situation "anonymous type" has a slightly different, more general meaning
used in the W3C XML Schema specification. It's easiest to understand the meaning I
looking at the UML class diagram in Figure 1 rather than at the XSD Schema document.
class diagram, if an association end is marked as an
anonymousType, then the
name of the associated class is anonymous when its instances appear in XML documents,
regardless of which schema language is actually used to define those documents. The
of anonymous types is realized differently in different schema languages.
You may have noticed that the XML document elements for
item appear with a lower-case first character; this is often called "lower
camel case" format. However, the default mapping from UML creates these element names
to the class names, which begin with upper-case letters. The "upper camel case" convention
used in the UML diagram is commonly used in object-oriented models and languages,
variety of conventions are followed in current XML schema vocabularies.
This issue is resolved by adding an
elementNameMapping property to a UML class
along with the
<<XSDcomplexType>> stereotype. This profile property
allows an XML schema designer to choose a preferred naming convention when modeling
schema details. Like many other profile properties, this value can be set as a default
the entire model so that all class names will be mapped to XML element names in the
To support an iterative modeling process, we have developed a web application that creates XML schemas from UML models. A key enabling technology is the XML Metadata Interchange (XMI) specification from the OMG that defines a standard for serializing UML models as XML documents. Many UML tools now support this standard import-export format, and some use it as their native file format. hyperModel accepts any XMI 1.0 file containing a UML 1.3 model and transforms it into either HTML or several alternative XML schema languages.
I created the purchase order model in both Rational Rose (www.rational.com) and the open source ArgoUML tools. XMI files can be exported from Rational Rose by using the add-in developed by Unisys and available for download from Rational Software. ArgoUML uses XMI as its native file format. When the purchase order model from either tool is uploaded to hyperModel, the HTML view shown in Figure 2 is displayed in a Web browser.
Figure 2: An HTML view of UML in hyperModel
hyperModel uses XSLT stylesheets to transform the XMI files into other representations, including HTML and XSD schemas. It allows individual classes or entire packages to be transformed and displayed in a browser. When the output transformation is changed to XSD, then the schema representation of the selected class is displayed, as shown in Figure 3. Selecting a UML package, e.g. PO or Address, will display the complete schema for a module in our vocabulary design.
Figure 3: Transformation of a UML class to XML Schema
We have found this application to be a tremendous help in learning about and refining an object-oriented approach to the analysis and design of XML schemas. We also use this tool as an integral part of our training classes on modeling XML vocabularies. Even if you don't use UML as a primary design tool in your project, experimenting with schema models and alternative UML profile customizations is a productive way to understand XML schema structures and design guidelines.
One of the benefits gained by using UML as part of our XML development process is that it enables a thoughtful approach to modular, maintainable, reusable application components. In the first two parts of this series, the PurchaseOrder and Address elements were specified in two separate diagrams, implying reusable submodels. UML includes package and namespace structures for making these modules explicit and also specifying dependencies between them.
Figure 4 illustrates a UML package diagram of this purchase order model. A package, shown as a file folder in a diagram, defines a separate namespace for all model elements within it, including additional subpackages. These UML packages are a very natural counterpart to XML namespaces. A dashed line arrow between two packages indicates that one is dependent on the other.
When used in a schema definition, each package produces a separate schema file. The implementation of dependencies varies among alternative schema languages. For DTDs they might become external entity references. For the W3C XML Schema, these package dependencies create either <include> or <import> elements, based on whether or not the target namespaces of related packages are equal. A dependency is shown from the PO package to the XSD_Datatypes package, but an import element is not created because this datatype library is inherently available as part of the XML Schema language.
Figure 4: UML package diagram showing schema integration
This object-oriented approach to XML schema design facilitates modular reuse, just
would do when using languages such as Java or C++. A new vocabulary module could import
current Address package and define a new subclass of Address or further specialize
with a new subclass. For example, BusinessUSAddress might be created with a new
mailstop attribute. When transformed to XML Schema, this new subtype would
automatically become available as valid content for the
billTo elements in a
purchaseOrder. This is conceptually similar
to the way one would create a new Java subclass within a new application-specific
other libraries are reused by importing, and possibly extending, their classes.
In order to help you when applying these ideas to your own e-business projects, I offer the following tips for success:
- Plan for reuse; shortsighted design leads to short-term use. Many of the same principles developed over the past decade for object-oriented and component-based reuse can be applied to XML applications. New web services standards, such as WSDL and WSFL, add the behavioral interface of reusable modules, complementing the structure defined by schema information models.
- Follow a consistent set of design guidelines. Whenever possible, set model-level default properties in the UML profile and avoid overriding the mappings for individual classes, attributes, and associations.
- Choose UML tools that provide complete support for the XMI model interchange standard. It often not practical or desirable to be locked into one design tool. In particular, code generation and reverse engineering tools can be built around the XMI document format that leverages the strength and flexibility of general-purpose XML tools.