Architectural Design Patterns for XML Documents
Introduction
No one wants to reinvent the wheel. One way programmers try to reuse good ideas about object design is to look to catalogs of design patterns like, most famously, the Gang of Four's Design Patterns: Elements of Reusable Object-Oriented Software (Gamma et. al.). XML has been used enough now that some high-level patterns are starting to emerge. Some patterns revolve around the low-level details of good schema design, like those put together by Dare Obasanjo in "W3C XML Schema Design Patterns"; but when you have a blank sheet of paper in front of you and you're ready to start designing your new XML format, you want patterns to guide you at a higher level. This article attempts to document a few whole-document design patterns that have proven themselves in the field.
Dynamic Document
Abstract
This pattern contains XML untyped by DTD or schema, but follows accessors for underlying program objects. It allows for unlimited extension by multiple, uncoordinated parties at the cost of lack of type-checking; and is simple to implement, with supporting libraries abounding (e.g. Apache Commons for Java; .NET's XML marshalling for C#).
Problem
You need to develop a format quickly, or many different people are contributing on an ad-hoc basis at different times, and it's not possible to have a fixed document design.
Context
This pattern is more common for private formats or technical ones, such as configuration for a server or a marshaling format. It also is a good match for Extreme Programming projects because you can get it working quickly, refactoring later to use another mechanism if needed.
Forces
- You need a "quick and dirty" solution.
- You can't know beforehand what extensions will be required, but you know they will be many and created by people other than the original document format creator.
Solution
Don't design a format and drop validation. Have a technical solution -- that is, a marshaller -- drive the XML generation. As data structures in your program change, the generated XML changes. In both .NET and in Java the marshaller uses reflection and extra metadata (.NET CLR attributes or JavaBean BeanInfo classes) to find the read/write properties of a class. It moves recursively through the object graph, generating a tree of XML elements named after the accessor. For example, these two classes:
public Person {
public String getName() { ... }
public void setName(String name) { ... }
public Address getAddress() { ... }
public void setAddress(Address address) { ... }
}
public Address {
public String getCity() { ... }
public void setCity(String city) { ... }
public String getState() { ... }
public void setState(String state) { ... }
}
might be marshalled as
<person>
<name>Kyle Downey</name>
<address>
<city>Forest Hills</city>
<state>Queens</state>
</address>
</person>
Discussion
Before sitting down to do a potentially complex document design, you should always ask yourself if a dynamic, data-driven format might be sufficient. Most XML-aware development platforms provide at least one library that will take an object and convert it into XML. You've done the object design, and in a couple lines of code, you've done your document design as well. If you're on a tight deadline, this is a potentially big time-saver for the development team.
But not so fast. Dynamic document most likely isn't an option for you if
- you're designing a long-lived business-critical exchange format and thus you don't want the format to change whenever you change your object design; or
- you don't trust the producers of the data to get it right, and cost of a mistake is high. For example, a document notifying you about inventory changes at a partner's warehouse and thus the lack of validation is risky.
Related Patterns
None. This is the "zero design pattern design." Once you start to involve other patterns, you're enforcing a human design rather than having a dynamic document.
Known Uses
- Ant build.xml
- Apache Tomcat server.xml
- JDK 1.4 JavaBean XML persistence
- .NET XML Marshalling
- SOAP default encoding
Composition
Abstract
Wherever possible, define the format using existing standards, referencing their elements by namespace rather than rolling your own. For example, add metadata to your metadata using RDF and the Dublin Core extensions rather than inventing your own <author> and <description> tags. Allows for independent evolution of markup by parties who know the business domain best.
Problem
You have an existing or planned document format that provides common types of data using its own, proprietary elements and types, and you're forced to maintain and understand that subset of data yourself, even though you're not a domain specialist.
Context
With all the standardization work out there, just about any business-oriented document problem presents an opportunity for defining some elements with Composition.
Forces
- There is an opportunity to reuse a <simpleType>, <complexType> or <element> from another XML schema.
- You can accept or even want to have the composed data type definition evolve independently of your own efforts.
- Patents or other legal encumbrances do not prevent you from reusing that schema.
Solution
XML namespaces make it very easy to import entire elements from
one spec to another. Let's say you're designing a format for
capturing use cases. You want to include attribution
information: who wrote it, when, etc.. You might want to
consider using the Dublin
Core RDF elements instead of defining your own
<author> and other meta-information tags:
<uc:use-case
xmlns:uc="http://example.com/my/usecase.xsd"
id="3">
<uc:metadata>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://dublincore.org/documents/2002/07/31/dcmes-xml/
dcmes-xml-xsd.xsd">
<rdf:Description>
<dc:title>Irritate Customer</dc:title>
<dc:creator>Kyle Downey</dc:creator>
<dc:date>2002-03-08</dc:date>
<dc:format>text/xml</dc:format>
<dc:language>en</dc:language>
<dc:contributor>Amber Archer Consulting Co.,
Inc.</dc:contributor>
<dc:identifier>UC#3</dc:identifier>
</rdf:Description>
</rdf:RDF>
</uc:metadata>
...
</uc:use-case>
In your use case schema you would have (in part)
<schema
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<import
namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
schemaLocation="http://dublincore.org/documents/2002/07/31/
dcmes-xml/dcmes-rdf.xsd"
/>
<element name="metadata">
<sequence>
<element ref="rdf:RDF"/>
</sequence>
</element>
</schema>
Discussion
One of the strong arguments for Composition --
aside from the well-documented programmer's virtue of laziness --
is that you can lean on the more specialized knowledge of
others. The people who put together Dublin Core put a lot of
thought into how to best represent document metadata. They have
been doing it since 1994. Most likely, you've been thinking about
how to put meta-information into your document since two
paragraphs ago. There's no match. So your choice is either to get
taken down by an angry librarian who's breaking noses and taking
names or reuse the work. This design pattern recommends the
latter.
As RDF and Dublin Core evolve, all you have to do is change
the namespace and the import statement to point to a newer
version of the schema, letting you take advantage of all the
latest and greatest ways of representing metadata, widgets,
documents, customers, fixed income instruments, or whatever it is
you're reusing with very little effort. This capacity for
concurrent evolution is, however, also the biggest gotcha in
Composition. Unless the promoters of your standard
have done the right thing and put version information in the
namespace and schema URI, there's a risk users in the field will
suddenly start getting backward-incompatible version 2.0 of the
schema and get very angry. So keep an eye on versioning, and if
necessary copy the schema to your own namespace and reuse from
there.
Even where you can't reuse a public XML schema, you can still look for common, reusable data clumps in your document formats. Let's put it this way: if you have five business processes involving customers and addresses, do you really need to define customer and address five times? Or even want to? Reuse through Composition can and should start inside your enterprise.
Related Patterns
None from this catalog.
Known Uses
- WSDL very nicely reuses XML schema by embedding a whole <schema> element in the WSDL document rather than defining its own mechanism for acceptable web service message types.
Pages: 1, 2 |