XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Architectural Design Patterns for XML Documents
by Kyle Downey | Pages: 1, 2

Self-Documenting Files

Abstract

Include as part of the document format elements that annotate the content.

Problem

Your human-readable format is so cryptic that it makes grown hackers cry: this fragment of Perl code rendered as XML that supposedly prints the entire Linux kernel when run:


   <perlml>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
   </perlml>

Note how it's much improved with just a little annotation:

<perlml>
   <annotation>
     You're not expected to understand this.
   </annotation>
   <code>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
   </code>
</perlml>

Context

Documents that are meant to be viewed by people or at least post-processed to generate documentation for people. Internal data structure formats like on-the-wire marshaling generally don't need annotation.

Forces

  • You're generating complex XML content that needs to be understood by people, or converted into some format for their viewing.
  • Ihe information in the document itself is not enough to be comprehensible.

Solution

Add an element or elements to your XML schema to include documentation. Generally you'll want to somehow tie the documentation to each significant element, so you could consider a base type -- for example, documentableType -- like this:

    <complexType name="documentableType">
      <sequence>
        <element name="annotation" type="string"/>
      </sequence>
    </complexType> 

Discussion

XML comments are great, but if you find that they're becoming mandatory for users to decode your XML documents, maybe it's time to allow those annotations to be part of the XML itself. Probably the biggest win you get out of this (aside from standardizing where the comments go and how they're formatted using all the powerful features of XML Schema) is an ability to apply the rest of the XML toolkit to your documents. You could, for instance, write a "widgetdoc" XSLT stylesheet that takes your widget.xml files and converts them into an HTML document describing the widget, including all your extra annotations that might not mean much to your automatic widget-stamping machine that was reading the XML before, but will mean a lot to anyone debugging the machine's software.

Related Patterns

There's a nice combination of Composition and Self-Documenting Files. There are two well-known formats for documentation in XML: DocBook and XHTML. DocBook is specialized for technical documentation, and there are powerful stylesheets out there for converting it to HTML and PDF. XHTML is, obviously, very good for online presentation. So if you want to be able to generate professional-quality documentation with links and images from your own XML format, you should definitely consider embedding XHTML or DocBook XML.

Known Uses

  • XML Schema has annotations, and you can convert them to HTML using xs3p, a very snazzy schemadoc tool
  • WSDL

Multipart Files

Abstract

Define an explicit mechanism for splitting content into multiple files: a primary document and satellite ones that represent faster changing components or sections of content shared with other primary documents.

Problem

Your documents have become large and unwieldy, and you want to share pieces of them.

Context

This pattern can apply to just about any format, but it seems to be more common in the technical arena.

Forces

  • As documents grow in size and complexity, and as there are more documents that can overlap, this pattern becomes more appealing.
  • Pushing against use, security and absolute versus relative URIs become issues for anyone processing the format: if it's too complicated for your taste, or if there are concerns about a cracker manipulating this facility to pull in content he or she should not have access to, you might want to disallow inclusions

Solution

Add to your schema an <import> or <include> element that takes an href attribute which can be any valid relative or absolute URI. Compliant processors for your format will load and incorporate valid subdocuments in your format from the URI.

SOAP 1.1 with Attachments takes an interesting alternative approach to this problem, using Composition along the way. SOAP coopts the pre-existing MIME standard and allows SOAP messages to be mime/multipart, with the SOAP XML message as the initial part and others linked to it. This allows SOAP to behave something like the FTP protocol with separate "control" and "data" streams. You can send metadata about binary content and directives for what the recipient should do with it as part of the XML message and just attach the content directly to the message.

Discussion

From #include to the humble href in HTML, systems abound with ways to pull together content from multiple locations. This makes documents more maintainable and encourages basic reuse of common components, whether they're shared stylesheet rules or whole XML schemas. While it may seem hard to find instances where you wouldn't want to allow sharing of document parts and file composition, as noted above in forces there are potential complexity and security issues with allowing inclusions.

Related Patterns

You might want to make your Self-Documenting Format refer to external documents rather than embedding them, and you can use Composition by reusing the W3C standards for file inclusion: XInclude and XML Base. But if you need to have different meanings for including other files (as XSLT does with its <import> or <include> elements) you might still have to roll your own.

Known Uses

  • XSLT
  • XML Schemas
  • WSDL
  • SOAP with Attachments

References and Acknowledgments



1 to 5 of 5
  1. Useless Artical
    2007-10-02 21:33:19 AdnanHasan
  2. Additional XML Patterns
    2004-02-27 10:43:11 Toivo Lainevool
  3. Elements with only sub-elements
    2003-06-05 07:40:07 Andres Becerra

  4. 2003-04-04 03:19:36 Victor Lindesay
  5. Hmmmm
    2003-03-31 08:45:06 Robin Berjon
1 to 5 of 5