XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Architectural Design Patterns for XML Documents

March 26, 2003

Introduction

No one wants to reinvent the wheel. One way programmers try to reuse good ideas about object design is to look to catalogs of design patterns like, most famously, the Gang of Four's Design Patterns: Elements of Reusable Object-Oriented Software (Gamma et. al.). XML has been used enough now that some high-level patterns are starting to emerge. Some patterns revolve around the low-level details of good schema design, like those put together by Dare Obasanjo in "W3C XML Schema Design Patterns"; but when you have a blank sheet of paper in front of you and you're ready to start designing your new XML format, you want patterns to guide you at a higher level. This article attempts to document a few whole-document design patterns that have proven themselves in the field.

Dynamic Document

Abstract

This pattern contains XML untyped by DTD or schema, but follows accessors for underlying program objects. It allows for unlimited extension by multiple, uncoordinated parties at the cost of lack of type-checking; and is simple to implement, with supporting libraries abounding (e.g. Apache Commons for Java; .NET's XML marshalling for C#).

Problem

You need to develop a format quickly, or many different people are contributing on an ad-hoc basis at different times, and it's not possible to have a fixed document design.

Context

This pattern is more common for private formats or technical ones, such as configuration for a server or a marshaling format. It also is a good match for Extreme Programming projects because you can get it working quickly, refactoring later to use another mechanism if needed.

Forces

  • You need a "quick and dirty" solution.
  • You can't know beforehand what extensions will be required, but you know they will be many and created by people other than the original document format creator.

Solution

Don't design a format and drop validation. Have a technical solution -- that is, a marshaller -- drive the XML generation. As data structures in your program change, the generated XML changes. In both .NET and in Java the marshaller uses reflection and extra metadata (.NET CLR attributes or JavaBean BeanInfo classes) to find the read/write properties of a class. It moves recursively through the object graph, generating a tree of XML elements named after the accessor. For example, these two classes:

public Person {
       public String getName() { ... }
       public void setName(String name) { ... }
       public Address getAddress() { ... }
       public void setAddress(Address address) { ... }
    }

    public Address {
       public String getCity() { ... }
       public void setCity(String city) { ... }
       public String getState() { ... }
       public void setState(String state) { ... }
    }

might be marshalled as

<person>
         <name>Kyle Downey</name>
         <address>
             <city>Forest Hills</city>
             <state>Queens</state>
         </address>
      </person> 

Discussion

Before sitting down to do a potentially complex document design, you should always ask yourself if a dynamic, data-driven format might be sufficient. Most XML-aware development platforms provide at least one library that will take an object and convert it into XML. You've done the object design, and in a couple lines of code, you've done your document design as well. If you're on a tight deadline, this is a potentially big time-saver for the development team.

But not so fast. Dynamic document most likely isn't an option for you if

  • you're designing a long-lived business-critical exchange format and thus you don't want the format to change whenever you change your object design; or
  • you don't trust the producers of the data to get it right, and cost of a mistake is high. For example, a document notifying you about inventory changes at a partner's warehouse and thus the lack of validation is risky.

Related Patterns

None. This is the "zero design pattern design." Once you start to involve other patterns, you're enforcing a human design rather than having a dynamic document.

Known Uses

  • Ant build.xml
  • Apache Tomcat server.xml
  • JDK 1.4 JavaBean XML persistence
  • .NET XML Marshalling
  • SOAP default encoding

Composition

Abstract

Wherever possible, define the format using existing standards, referencing their elements by namespace rather than rolling your own. For example, add metadata to your metadata using RDF and the Dublin Core extensions rather than inventing your own <author> and <description> tags. Allows for independent evolution of markup by parties who know the business domain best.

Problem

You have an existing or planned document format that provides common types of data using its own, proprietary elements and types, and you're forced to maintain and understand that subset of data yourself, even though you're not a domain specialist.

Context

With all the standardization work out there, just about any business-oriented document problem presents an opportunity for defining some elements with Composition.

Forces

  • There is an opportunity to reuse a <simpleType>, <complexType> or <element> from another XML schema.
  • You can accept or even want to have the composed data type definition evolve independently of your own efforts.
  • Patents or other legal encumbrances do not prevent you from reusing that schema.

Solution

XML namespaces make it very easy to import entire elements from one spec to another. Let's say you're designing a format for capturing use cases. You want to include attribution information: who wrote it, when, etc.. You might want to consider using the Dublin Core RDF elements instead of defining your own <author> and other meta-information tags:

<uc:use-case 
  xmlns:uc="http://example.com/my/usecase.xsd" 
  id="3">
  <uc:metadata>
    <rdf:RDF
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
      xmlns:dc="http://dublincore.org/documents/2002/07/31/dcmes-xml/
dcmes-xml-xsd.xsd">
      <rdf:Description>
        <dc:title>Irritate Customer</dc:title>
        <dc:creator>Kyle Downey</dc:creator>
        <dc:date>2002-03-08</dc:date>
        <dc:format>text/xml</dc:format>
        <dc:language>en</dc:language>
        <dc:contributor>Amber Archer Consulting Co., 
           Inc.</dc:contributor>
        <dc:identifier>UC#3</dc:identifier>
      </rdf:Description>
    </rdf:RDF>
  </uc:metadata>
...
</uc:use-case>

In your use case schema you would have (in part)

<schema 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <import 
      namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      schemaLocation="http://dublincore.org/documents/2002/07/31/
dcmes-xml/dcmes-rdf.xsd"
    />

    <element name="metadata">
      <sequence>
        <element ref="rdf:RDF"/>
      </sequence>
    </element>
</schema>

Discussion

One of the strong arguments for Composition -- aside from the well-documented programmer's virtue of laziness -- is that you can lean on the more specialized knowledge of others. The people who put together Dublin Core put a lot of thought into how to best represent document metadata. They have been doing it since 1994. Most likely, you've been thinking about how to put meta-information into your document since two paragraphs ago. There's no match. So your choice is either to get taken down by an angry librarian who's breaking noses and taking names or reuse the work. This design pattern recommends the latter.

As RDF and Dublin Core evolve, all you have to do is change the namespace and the import statement to point to a newer version of the schema, letting you take advantage of all the latest and greatest ways of representing metadata, widgets, documents, customers, fixed income instruments, or whatever it is you're reusing with very little effort. This capacity for concurrent evolution is, however, also the biggest gotcha in Composition. Unless the promoters of your standard have done the right thing and put version information in the namespace and schema URI, there's a risk users in the field will suddenly start getting backward-incompatible version 2.0 of the schema and get very angry. So keep an eye on versioning, and if necessary copy the schema to your own namespace and reuse from there.

Even where you can't reuse a public XML schema, you can still look for common, reusable data clumps in your document formats. Let's put it this way: if you have five business processes involving customers and addresses, do you really need to define customer and address five times? Or even want to? Reuse through Composition can and should start inside your enterprise.

Related Patterns

None from this catalog.

Known Uses

  • WSDL very nicely reuses XML schema by embedding a whole <schema> element in the WSDL document rather than defining its own mechanism for acceptable web service message types.

Pages: 1, 2

Next Pagearrow