XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


Modeling XML Vocabularies with UML: Part I

August 22, 2001

A Russian translation of this article is available here.

The arrival of the W3C's XML Schema specification has evoked a variety of responses from software developers, system integrators, XML document analysts, authors, and designers of B2B vocabularies. Some like the richer structure and semantics that can be expressed with these new schemas as compared to DTDs, while others complain about excessive complexity. Many find that the resulting schemas are difficult to share with wider audiences of users and business partners.

I look past many of these differences of opinion to view XML Schema simply as implementation syntax for models of business vocabularies. Other forms of model representation and presentation are more effective than W3C XML Schema when specifying new vocabularies or sharing definitions with users. In particular, I favor the Unified Modeling Language (UML) as a widely adopted standard for system specification and design. My goal in this article and in this series is to share some thoughts about how these two standards are complementary and to work through a simple example that makes the ideas concrete.

Although this discussion is focused on the W3C XML Schema specification, the same concepts are easily transferred to other XML schema languages. Indeed, I have already applied the same techniques to creating and reverse engineering DTDs and SOX schemas, as well as RELAX, TREX, and RELAX NG. In general, I use the term "schema" when referring to the family of XML schema languages.

The Role of Models in XML Applications

Also in this Series

Modeling XML Vocabularies with UML: Part Two

Modeling XML Vocabularies with UML: Part Three

It can be difficult to understand the breadth of a large multi-enterprise system. Most people need to divide and conquer the problem as a set of alternate models and views. Each of these models deliberately ignores aspects of the system that are not relevant to its purpose. Building these kinds of models is fundamental to the way we cope with the complexity of everyday life by ignoring unnecessary details to enable us to focus on the task at hand. Different stakeholder groups have different needs with respect to abstraction and focus.

In the context of B2B system integration, all business partners must agree on the information models that define the vocabulary for task-oriented communication. The models include both the data structure for XML documents that are exchanged, as well as the process models of the extended dialogs that are required to complete complex business transactions.

Historically, in system analysis and design, a variety of techniques, tools, and methodologies has existed for guiding and supporting these alternative models of system structure and behavior. In the absence of formal methods or tools, models are created using PowerPoint, Visio, or paper and pencil to help communicate a system's purpose and function. And when there are no written models, system architects work from mental models as a way to comprehend the whole and its parts. An XML schema is also a vocabulary model written in the syntax of that specification language.

A high-level process for developing XML vocabularies is shown in Figure 1 below. It includes three decision points that determine the final vocabulary definition, regardless of which schema language is used. Data-oriented versus text-oriented applications may have different usage requirements. For example, a data-oriented vocabulary can be optimized for serialization of objects or database query results and its constraints should be carefully aligned with the data-types and referential integrity constraints of its sources. These data-oriented documents may never be viewed by humans, other than by developers testing the application.

A text-oriented vocabulary often has human users who need to edit the XML documents, with or without the assistance of GUI editing tools. Its structure must be easily understood by people who write stylesheets that transform and present the documents' content. An XML vocabulary design that works perfectly for data interchange might cause human users unnecessary pain and distress. Don't forget the needs of your users when creating the XML schema!

Figure 1: UML activity diagram for schema development process

The process diagram in Figure 1 is a UML activity diagram, which is one of nine diagram types defined by that standard. This diagram was created using Rational Rose, one of the most widely used UML modeling tools. Most of our discussion, however, is focused on the UML class diagram that is used to specify the static information structure of a system's XML vocabulary in our application context.

What is UML?

Comment on this article Got questions about UML and XML, or any experiences using them together?
Post your comments

The Unified Modeling Language (UML) defines a standard language and graphical notation for creating models of business and technical systems. Contrary to popular opinion, UML is not limited to use as a tool for programmers. The UML defines model types that span a range from functional requirements and activity workflow models to class structure design and component diagrams. These models, and a development process that uses them, improve and simplify communication among an application's many diverse stakeholders.

A UML class diagram can be constructed to represent the elements, relationships, and constraints of an XML vocabulary visually. With a little initial coaching, class diagrams allow complex vocabularies to be shared with non-technical business stakeholders. A very simple subset of a product catalog vocabulary is shown as a class diagram in Figure 2 [1].

Figure 2: A simple UML class diagram

The primary elements of a UML class diagram are as follows.

  • Class -- this example defines two classes: CatalogItem and Organization. A class represents an aggregation of structural features and defines a namespace for those feature names. Thus, both classes can contain an attribute named "name" but their class namespace scope makes the two attributes distinct.
  • Attribute -- each class may optionally define a set of attributes. Each attribute has a type; in this example string, double, and float refer to the built-in datatypes as defined by the XML Schema specification. For those of you thinking ahead to XML schema design, specifying a UML attribute does not limit the schema to an XML attribute; the mapping to schema syntax allows either an XML attribute or child element.
  • Operation -- the computeTax() operation of CatalogItem specifies part of the behavior for this class. In other words, what does the class do, in addition to defining the structure of its data? In object-oriented parlance, if you send a computeTax message to a CatalogItem object, it will return a floating-point data value. This operation does not expect any parameters, but they could be specified between the parentheses. We will not use class operations in the specification of XML vocabulary, but their definition would be critical to Web Services, especially a WSDL specification of SOAP messages.
  • Association -- an association relates two or more classes in a model. If an association has an arrow on one end, it means that the association is usually navigated in one direction and provides a hint to design and implementation of this vocabulary.
  • Role & Multiplicity -- the end of an association may specify the role of the class; the Organization plays a supplier role for a CatalogItem in this model. In addition, the "1..*" multiplicity means that there must be one or more suppliers for each catalog item.
  • Generalization -- although Figure 2 does not include class inheritance, this structure is fundamental to object-oriented models and is included in the next expanded example.

Pages: 1, 2

Next Pagearrow