Design Patterns in XML Applications: Part II
February 16, 2000
Part II: XML-specific patterns
Table of Contents
Patterns are a useful technique for the transmission of knowledge about recurrent problems in software development. This article, the second of two complementary pieces (see Part I here), is focused on XML-specific patterns as opposed to traditional design patterns in XML specific contexts.
For the first part of this article, some basic knowledge about UML class diagrams will be useful (see our basic UML class diagram guide). For the second part, some basic knowledge of XML DTDs, such as entities, will also be useful.
What are XML patterns?
XML Patterns denotes two kinds of patterns: (1) Program Design Patterns, specifically treating XML-related problems; and (2) Information Structuring Patterns, for the design/implementation of DTDs, schemas etc.
XML patterns of the first kind tend to be compositions and refinements of traditional design patterns. Yet the process of naming and clearly defining them helps in two ways:
- It builds a common base language and base of knowledge for typical XML applications, thus improving understandability, and empowering developers at all levels of expertise.
- It helps XML integrate into the Object Oriented mainstream.
XML patterns of the second kind, those for information design, are focused on finding solutions for common problems in the design of document type definitions (DTDs).
The number of XML patterns is growing quickly, so choosing which ones to present has not been an easy task. I have decided to present here common patterns in the three categories that seem to be most stable in the XML patterns arena: Patterns for Program Design, Patterns for DTD design, and Patterns for DTD Implementation.
XML patterns is a formidable subject, which these articles can only hope to introduce. This article is therefore an invitation to further explore patterns, rather than a catalog of XML patterns. This exploration is not without its pitfalls, which is why I have included a short guide of common misconceptions and warnings at the end of this article. I hope that potential pattern writers can make use of them in order to build a clearer common base of knowledge.
Our tour of XML patterns starts with XML patterns in processing applications, examining the "XMLable" pattern.
Table of Contents
XML application design patterns (abbreviated here as "XADP") are named, reusable solutions for common problems at the application level. They are often refinements of traditional patterns.
Because of their nature, XADPs can be easily and neatly expressed in the same way traditional patterns are usually presented. That is, in sections for the name, synopsis, context, solution, consequences, and related patterns.
The following pattern is a typical XADP, called the "XMLable pattern." It has been successfully used in a number of applications, and tackles one common problem in XML-aware applications: the construction of internal representations of XML data as meaningful objects.
Also known as "XML-reader/writer"
Originator: Fabio Arciniegas A.Synopsis
The XMLable pattern defines a solution to managing information that is persisted as XML data, but must also be managed as meaningful objects (i.e., not as a general data structure such as the DOM) inside an application.Context
Suppose you are writing an e-mail program that uses XML documents for the persistence of the messages. This is pretty useful since you can do things like apply various stylesheets to these documents and get all sorts of nice presentations for them. But you also need to upload and manage that information into your program: you need objects that represent your messages.
Keeping the DOM representation of every object can be very memory-intensive, especially
when you are managing a large number of messages. More importantly, DOM objects contain
semantics whatsoever about being a message. There is no such thing as an interface
other objects to interact with it as an e-mail message (no
getDate(), just plain DOM manipulation). Choosing to maintain the DOM
representation of hundreds of messages is in most cases a bad design decision; using
would probably lead to a poorly structured, hard to maintain program.
The XMLable pattern addresses the problem of how to create e-mail objects by using the data contained in the XML document without having to keep the DOM representation in memory.
The solution that the pattern suggests is to provide the emailMessage class with a partner class, emailXMLPersistenceManager, whose sole responsibility is to make the object persist in an XML representation. Whether recovering the state of the object or serializing it in XML, it is the PersistenceManager and not the object itself that handles this activity.
The considerations that lead to the general solution proposed by the XMLable pattern are:
- Multiple objects, whose data is gathered from XML documents, need to be manipulated internally.
- Memory restrictions make DOM prohibitive.
- Design and program quality impose the need to represent the data as something more meaningful to the application domain than the DOM tree.
This figure shows a class diagram depicting the classes and interfaces participating in the XMLable pattern. The descriptions of the roles played by these classes in the pattern are below:
- A container responsible for the creation of the XMLableConcreteClass instances. In the e-mail example, this is the EmailProgram class.
- Gathers (provides the base class for) different classes that can be made persistent through the use of the correspondent ConcreteXMLPersistenceMgr.
- The actual class whose instances will be registered with the ConcreteXMLPersistenceMgr and finally saved as XML. In the e-mail example, this is the EmailMessage class.
- A simple interface declaring the methods that provide XML persistence to an object. This also declares a method to register the concrete XMLPersistenceMgr object with the XMLable object.
- This is the core of the pattern. The class implements the XMLPersistenceMgr interface. It is also responsible for construcing the XMLable object from XML documents. To do that, the class implements the DocumentHandler methods (defined by SAX) in order to be able to update the registered class from the XML source.
- DocumentHandler (defined in SAX)
- The ConcreteXMLPersistenceMgr needs to be informed of basic parsing events. In order to do so, it implements this interface and registers with the SAX parser. The parser uses the instance to report basic document-related events such as the start and end of elements.
- All the complexity involved in managing the persistence of the object is shifted to the PersistenceMgr.
- There is a tight coupling between the XMLable class and the PersistanceMgr.
- The size of the XMLable objects is smaller. This is very useful in applications handling many instances of the XMLable class.
- Responsibility for instantiation and update of the XMLable object is well separated, allowing for the creation and manipulation of the object even outside of the XML persistence process.
- High Cohesion: This pattern encourages putting specialized methods in special-purpose classes. The use of the PersistenceMgr is a good example of a High Cohesion pattern.
- Singleton: The Singleton pattern ensures that only one instance of a class is created. This can be the case for the PersistenceMgr class if, among other reasons, concurrency considerations must be easily minimized.
- Balking: If an object's method is called when the object is not in an appropriate state to execute it, the method returns without doing anything. This pattern is useful for systems implementing PersistenceMgr as a Singleton, but where the client may start concurrent requests to save XMLable objects.
In this section we saw a common example of an XML pattern for XML processing applications. In the next section, we will study XML patterns for DTD structuring.
XML Patterns in DTD structure
Table of Contents
These patterns are named solutions to recurring problems in the overall structure of document types. Note that the term DTD here is applied in the sense of document type definition. These patterns are not restricted to any given form of XML schema definition.
DTD structure patterns are usually smaller than application design patterns. Therefore, two examples will be presented. For more information, see the links in the resources section.
Choice Reducing Container
Originator: Toivo LainevoolSynopsis
When creating large DTDs with many logical units, authors might be required to learn a large number of these units to know how to use the DTD. Reducing the number of choices the author has to make at any point in the DTD (by grouping related elements beneath newly introduced elements) will reduce the burden on the author.Context
In a DTD with many logical units, a user of a document can be overwhelmed with the number of choices that have to be made. With many options users have a difficult time knowing how to compose all of the elements available. This is common in large, general-purpose DTDs where many logical units are presented.Forces
- Either because of the nature of the data to be represented, or because of the intention of making the DTD applicable in many situations, large numbers of logical units need to appear in the DTD.
- Several of the elements can be naturally grouped as members of a higher abstraction (e.g., "magnolia" and "rose" under "flowers").
- The learning process of the user wants to be simplified, presenting him or her with a small number of choices at each point.
Here is a DTD fragment that presents a lot of choice to the author:
<!ELEMENT Doc (Para | OrderedList | UnorderedList | Figure | Artwork )+>
Here the author has 5 different elements to choose from after creating the
element. This choice could be limited by introducing new elements, and grouping some
existing elements together as children of the new elements, like this:
<!ELEMENT Doc (Para | List | Illustration )+)> <!ELEMENT List (OrderedList | UnorderedList )> <!ELEMENT Illustration (Figure | Artwork )>
Also known as "Factoring Metadata"
Originator: Fabio Arciniegas A.Synopsis
During the definition of a DTD, it is not unusual to find several elements sharing a common set of metadata needs. The Cross-Cutting Metadata Pattern identifies such common subsets and encapsulates them, in order to make a clearer DTD.Context
Elements often have associated metadata (e.g., a unique identifier). Furthermore, many elements can share the same metadata needs. This is often the case in DTDs for element collections. Suppose you are developing a DTD for the items of a music and video shop. Your items, represented as elements, are bound to have many metadata needs in common: an identifier, an availability status, or maybe a recommendation status. The structure proposed by the Cross-Cutting Metadata Pattern is to encapsulate these common metadata needs (very often in a parameter entity), leading to a better organized and more maintainable DTD.Forces
The needs that lead to the use of this pattern are straightforward:
- There are a number of elements that have metadata requirements.
- These elements share a subset of those requirements.
- The number of elements and the size of the subset are big enough to make the inclusion
of a parameter entity (or an attribute group in XML: Schema) an improvement in readability
and maintainability, instead of adding "bloat." For example, if there are only 2 elements,
and the only thing they share is
ID, introducing an extra construct is not an improvement.
Cross-Cutting Metadata takes the common subset of metadata needs and expresses it in whatever mechanism the schema definition language provides for encapsulation (e.g., parameter entities in XML DTDs). It then includes this construct in all the elements that share it. The pattern simply factors the metadata out of several elements. Even though metadata is often expressed in attributes, the pattern can also be applied if the metadata is in the form of elements.Consequences
- Common metadata is easier to localize, and thus easier to modify.
- When applied to a large number of elements, readability is greatly improved.
- Reusability of metadata declarations is easier to achieve.
This simple example deals with the music and video store DTD mentioned above. Consider the initial declarations:
<!ATTLIST video id ID #REQUIRED available (yes|no|onrequest) "onrequest" onSale CDATA #FIXED "yes"> <!ATTLIST CD id ID #REQUIRED available (yes|no|onrequest) "yes" recommendation CDATA #IMPLIED >
From these declarations we can derive a parameter entity using the Cross-Cutting Metedata pattern:
<!ENTITY % cross-cutting-metadata " id ID #REQUIRED available (yes|no|onrequest) onrequest" > <!ATTLIST video %cross-cutting-metadata; onSale CDATA #FIXED "yes" > <!ATTLIST cd %cross-cutting-metadata; recommendation CDATA #IMPLIED >
We can then simply include this entity in all the element declarations that share them.
Not only has readability improved, but maintainability is higher as well. Now, when we need to add additional metadata to each element (e.g., "onSale"), we can easily and safely add it without enduring the error-prone process of including it manually on each element type.
XML Patterns in Element Definition
Table of Contents
Arguably, the most widespread kind of XML patterns are those related to DTD content. These patterns are named solutions to recurring problems in the design of element types.
Not all patterns can or should be expressed in the same way. For instance, traditional behavioral patterns commonly have a different expression from data definition patterns. In this section, I opted to keep the layout for the patterns as defined by Liam Quin.
Originator: Liam Quin
This pattern is included in its original formulation.Synopsis
The Running Text Pattern is used for general textual content that may contain markup at the phrase, word, or symbol level, but not at the block level.Actors
The Running Text Pattern has these participants:
- Block Level Elements: The environment in which the pattern occurs.
- Internal Markup: Markup that can occur within Running Text.
- Running Text Definition: The implementation of Running Text.
Running Text is usually represented in a Document Type Definition as a Parameter Entity. The actual elements listed will vary from DTD to DTD, depending on the application; the Pattern specifies only the use of the entity RunningText:
<!ENTITY % RunningText ' #PCDATA|Quote|Emphasis|MathML|Phrase|BibRef| FootNoteReference ' >
The pattern is used in the content model of other elements:
<!ELEMENT FootnoteBody (%RunningText;)* >
The purpose of a single definition for Running Text is two-fold: firstly, to encapsulate the concept of generic running text, making the intent of a document type definition clearer; secondly, to ensure that the same set of basic elements is allowed everywhere text is allowed.
Additional elements can be added for a specific situation as follows:
<!ELEMENT PlaceName (%RunningText;|PlaceAlias|GridReference)* >Processing
This pattern does not require special processing. It is normally only seen by a validating XML processor.Variations
In a complex Document Type Definition, it may be convenient to include other parameter entities in the definition of RunningText:
<!ENTITY % RunningText ' #PCDATA|Quote|Emphasis|Phrase|BibRef| %elements.footnotes;|%elements.MathML; ' >
Originator: Fabio Arciniegas A.Synopsis
The Marker Attribute Pattern is used when certain elements need to be marked via an attribute so they can be processed in a different way by a style sheet/program that recognizes the mark.Actors
The Marker Attribute Pattern has three participants:
- Marker Attribute: The marker is an attribute whose only purpose is to signal a binary state. If the attribute is present, the element must be treated differently.
- Marked Element: The element that may contain the Marker Attribute.
- Processing Application: The responsibility for performing the special action if the mark is encountered. This is usually encapsulated in a style sheet.
The markup necessary for this pattern is reduced to an attribute declaration:
<!ELEMENT video (title,artist,whatnot)> <!ATTLIST video onSale CDATA #FIXED "yes">
and, possibly, the appearance of the attribute in the XML instance:
<video onsale="yes"> ...Processing
As mentioned above, a key characteristic of this pattern is outside of the XML document. The special behavior derived from the marking is usually achieved by means of a style sheet. The following example shows a simple case.Example
A Marker Attribute for items on sale can be applied to the elements of a hypothetical DTD for videos as shown above. A simple XSLT style sheet can take care of a special presentation for the marked elements:
<xsl:if test="@onSale"> <h4> <xsl:value-of select="artist"/> is on sale. </h4> </xsl:if> <!-- handle the rest of the element -->
Advice for the Use and Creation of XML Patterns
A Little Good Advice
Table of Contents
During the use and creation of patterns, several misconceptions and pitfalls can be encountered. Since XML patterns are no exception, I would like to finish by briefly highlighting some of the main trouble spots. For more advice on healthy pattern use, I recommend John Vilissides' book "Pattern Hatching" (see References).
Patterns Are Not the Holy Grail
Patterns are a powerful way to communicate expertise: they create a common design language, they help make your system more understandable to others, etc.... But they are not a replacement for creativity, nor are they automatic quality assurances. Patterns are just another tool in your box—learn them, use them, enjoy them, but don't overestimate them.
Tautologies Are Not Patterns
This phenomenon seems to have cooled down in the traditional pattern world, but it appears to still be a problem in the XML patterns arena. XML "patterns" that merely state a tautology like "use an attribute where an attribute is needed" are not useful for anyone. This problem was pointed out a long time ago by Rick Jelliffe, but still seems common enough to merit mentioning here.
Patterns Are Not Restricted to Particular Aspects of XML Applications
Depending on our personal background, we tend to see some areas as more suitable for pattern creation than others. Some people take this to extremes, claiming XML patterns can only be used in particular situations. This is obviously a mistake. Opportunities to help others gain expertise about recurrent problems and solutions arise in every area. Patterns are a great tool—we don't have to restrain ourselves, let's use them wherever they are useful!
This concludes our brief introduction to XML Patterns. Please write to me (firstname.lastname@example.org) if you have questions, suggestions, or want to discuss further work in this field.
Acknowledgements and References
Table of Contents
I would like to thank Liam Quin, Rubby Casallas, and Toivo Lainevool for their contributions to this article.
Erich Gamma, Richard Helm, Ralph Johnson & John Vilissides, 1995, Design Patterns: Elements of Reusable Object Oriented Software.
John Vilissides, 1997, Pattern Hatching.
Sherman R. Alpert, Kyle Brown, Bobby Woolf, 1998, The Design Patterns Smalltalk Companion.
Ian Graham and Liam Quin's web pages "Introduction to XML Design Patterns" at http://www.groveware.com/xmlbook/patterns.html
Rick Jelliffe, 1998, The XML & SGML Cookbook: Recipes for Structured Information, Charles F. Goldfarb Series on Open Information Management, ISBN 0-13-614223-0.
More XML patterns can be found at Toivo Lainevool's forthcoming site, http://www.xmlpatterns.com/