Architectural Design Patterns for XML Documents
by Kyle Downey
|
Pages: 1, 2
Self-Documenting Files
Abstract
Include as part of the document format elements that annotate the content.
Problem
Your human-readable format is so cryptic that it makes grown hackers cry: this fragment of Perl code rendered as XML that supposedly prints the entire Linux kernel when run:
<perlml>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
</perlml>
Note how it's much improved with just a little annotation:
<perlml>
<annotation>
You're not expected to understand this.
</annotation>
<code>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
</code>
</perlml>
Context
Documents that are meant to be viewed by people or at least post-processed to generate documentation for people. Internal data structure formats like on-the-wire marshaling generally don't need annotation.
Forces
- You're generating complex XML content that needs to be understood by people, or converted into some format for their viewing.
- Ihe information in the document itself is not enough to be comprehensible.
Solution
Add an element or elements to your XML schema to include
documentation. Generally you'll want to somehow tie the
documentation to each significant element, so you could consider a
base type -- for example, documentableType -- like
this:
<complexType name="documentableType">
<sequence>
<element name="annotation" type="string"/>
</sequence>
</complexType>
Discussion
XML comments are great, but if you find that they're becoming mandatory for users to decode your XML documents, maybe it's time to allow those annotations to be part of the XML itself. Probably the biggest win you get out of this (aside from standardizing where the comments go and how they're formatted using all the powerful features of XML Schema) is an ability to apply the rest of the XML toolkit to your documents. You could, for instance, write a "widgetdoc" XSLT stylesheet that takes your widget.xml files and converts them into an HTML document describing the widget, including all your extra annotations that might not mean much to your automatic widget-stamping machine that was reading the XML before, but will mean a lot to anyone debugging the machine's software.
Related Patterns
There's a nice combination of Composition and
Self-Documenting Files. There are two well-known
formats for documentation in XML: DocBook and XHTML. DocBook is
specialized for technical documentation, and there are powerful
stylesheets out there for converting it to HTML and PDF. XHTML
is, obviously, very good for online presentation. So if you want
to be able to generate professional-quality documentation with
links and images from your own XML format, you should definitely
consider embedding XHTML or DocBook XML.
Known Uses
- XML Schema has annotations, and you can convert them to HTML using xs3p, a very snazzy schemadoc tool
- WSDL
Multipart Files
Abstract
Define an explicit mechanism for splitting content into multiple files: a primary document and satellite ones that represent faster changing components or sections of content shared with other primary documents.
Problem
Your documents have become large and unwieldy, and you want to share pieces of them.
Context
This pattern can apply to just about any format, but it seems to be more common in the technical arena.
Forces
- As documents grow in size and complexity, and as there are more documents that can overlap, this pattern becomes more appealing.
- Pushing against use, security and absolute versus relative URIs become issues for anyone processing the format: if it's too complicated for your taste, or if there are concerns about a cracker manipulating this facility to pull in content he or she should not have access to, you might want to disallow inclusions
Solution
Add to your schema an <import> or <include> element
that takes an href attribute which can be any valid
relative or absolute URI. Compliant processors for your format
will load and incorporate valid subdocuments in your format from
the URI.
SOAP 1.1 with Attachments takes an interesting alternative approach to this problem, using Composition along the way. SOAP coopts the pre-existing MIME standard and allows SOAP messages to be mime/multipart, with the SOAP XML message as the initial part and others linked to it. This allows SOAP to behave something like the FTP protocol with separate "control" and "data" streams. You can send metadata about binary content and directives for what the recipient should do with it as part of the XML message and just attach the content directly to the message.
Discussion
From #include to the humble href in
HTML, systems abound with ways to pull together content from
multiple locations. This makes documents more maintainable and
encourages basic reuse of common components, whether they're
shared stylesheet rules or whole XML schemas. While it may seem
hard to find instances where you wouldn't want to allow
sharing of document parts and file composition, as noted above
in forces there are potential complexity and security issues
with allowing inclusions.
Related Patterns
You might want to make your Self-Documenting Format
refer to external documents rather than embedding them, and you
can use Composition by reusing the W3C standards for file
inclusion: XInclude and XML Base. But if you need to have
different meanings for including other files (as XSLT does with
its <import> or <include> elements) you might still
have to roll your own.
Known Uses
- XSLT
- XML Schemas
- WSDL
- SOAP with Attachments
References and Acknowledgments
- XML Schemas
- XSL/XSLT
- SOAP 1.2
- SOAP 1.2 Attachments
- WSDL 1.2
- XHTML
- XML Pointer, XML Base and XLink
- Dublin Core Group
- Expressing Simple Dublin Core in RDF/XML
- Programming Perl, 2nd Edition (for source of the "three great virtues of a programmer")
- thanks to Raymond Blum for pointing out that Dynamic Document and XP go together well
- Useless Artical
2007-10-02 21:33:19 AdnanHasan - Additional XML Patterns
2004-02-27 10:43:11 Toivo Lainevool - Elements with only sub-elements
2003-06-05 07:40:07 Andres Becerra - Elements with only sub-elements
2003-06-05 08:17:21 Kyle Downey - Elements with only sub-elements
2003-06-06 07:19:16 Andres Becerra
2003-04-04 03:19:36 Victor Lindesay- Hmmmm
2003-03-31 08:45:06 Robin Berjon - Hmmmm
2003-03-31 09:03:50 Kyle Downey - Hmmmm (correction)
2003-03-31 09:07:45 Kyle Downey