Architectural Design Patterns for XML Documents
by Kyle Downey
|
Pages: 1, 2
Self-Documenting Files
Abstract
Include as part of the document format elements that annotate the content.
Problem
Your human-readable format is so cryptic that it makes grown hackers cry: this fragment of Perl code rendered as XML that supposedly prints the entire Linux kernel when run:
<perlml>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
</perlml>
Note how it's much improved with just a little annotation:
<perlml>
<annotation>
You're not expected to understand this.
</annotation>
<code>
@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print
</code>
</perlml>
Context
Documents that are meant to be viewed by people or at least post-processed to generate documentation for people. Internal data structure formats like on-the-wire marshaling generally don't need annotation.
Forces
- You're generating complex XML content that needs to be understood by people, or converted into some format for their viewing.
- Ihe information in the document itself is not enough to be comprehensible.
Solution
Add an element or elements to your XML schema to include
documentation. Generally you'll want to somehow tie the
documentation to each significant element, so you could consider a
base type -- for example, documentableType -- like
this:
<complexType name="documentableType">
<sequence>
<element name="annotation" type="string"/>
</sequence>
</complexType>
Discussion
XML comments are great, but if you find that they're becoming mandatory for users to decode your XML documents, maybe it's time to allow those annotations to be part of the XML itself. Probably the biggest win you get out of this (aside from standardizing where the comments go and how they're formatted using all the powerful features of XML Schema) is an ability to apply the rest of the XML toolkit to your documents. You could, for instance, write a "widgetdoc" XSLT stylesheet that takes your widget.xml files and converts them into an HTML document describing the widget, including all your extra annotations that might not mean much to your automatic widget-stamping machine that was reading the XML before, but will mean a lot to anyone debugging the machine's software.
Related Patterns
There's a nice combination of Composition and
Self-Documenting Files. There are two well-known
formats for documentation in XML: DocBook and XHTML. DocBook is
specialized for technical documentation, and there are powerful
stylesheets out there for converting it to HTML and PDF. XHTML
is, obviously, very good for online presentation. So if you want
to be able to generate professional-quality documentation with
links and images from your own XML format, you should definitely
consider embedding XHTML or DocBook XML.
Known Uses
- XML Schema has annotations, and you can convert them to HTML using xs3p, a very snazzy schemadoc tool
- WSDL
Multipart Files
Abstract
Define an explicit mechanism for splitting content into multiple files: a primary document and satellite ones that represent faster changing components or sections of content shared with other primary documents.
Problem
Your documents have become large and unwieldy, and you want to share pieces of them.
Context
This pattern can apply to just about any format, but it seems to be more common in the technical arena.
Forces
- As documents grow in size and complexity, and as there are more documents that can overlap, this pattern becomes more appealing.
- Pushing against use, security and absolute versus relative URIs become issues for anyone processing the format: if it's too complicated for your taste, or if there are concerns about a cracker manipulating this facility to pull in content he or she should not have access to, you might want to disallow inclusions
Solution
Add to your schema an <import> or <include> element
that takes an href attribute which can be any valid
relative or absolute URI. Compliant processors for your format
will load and incorporate valid subdocuments in your format from
the URI.
SOAP 1.1 with Attachments takes an interesting alternative approach to this problem, using Composition along the way. SOAP coopts the pre-existing MIME standard and allows SOAP messages to be mime/multipart, with the SOAP XML message as the initial part and others linked to it. This allows SOAP to behave something like the FTP protocol with separate "control" and "data" streams. You can send metadata about binary content and directives for what the recipient should do with it as part of the XML message and just attach the content directly to the message.
Discussion
From #include to the humble href in
HTML, systems abound with ways to pull together content from
multiple locations. This makes documents more maintainable and
encourages basic reuse of common components, whether they're
shared stylesheet rules or whole XML schemas. While it may seem
hard to find instances where you wouldn't want to allow
sharing of document parts and file composition, as noted above
in forces there are potential complexity and security issues
with allowing inclusions.
Related Patterns
You might want to make your Self-Documenting Format
refer to external documents rather than embedding them, and you
can use Composition by reusing the W3C standards for file
inclusion: XInclude and XML Base. But if you need to have
different meanings for including other files (as XSLT does with
its <import> or <include> elements) you might still
have to roll your own.
Known Uses
- XSLT
- XML Schemas
- WSDL
- SOAP with Attachments
References and Acknowledgments
- XML Schemas
- XSL/XSLT
- SOAP 1.2
- SOAP 1.2 Attachments
- WSDL 1.2
- XHTML
- XML Pointer, XML Base and XLink
- Dublin Core Group
- Expressing Simple Dublin Core in RDF/XML
- Programming Perl, 2nd Edition (for source of the "three great virtues of a programmer")
- thanks to Raymond Blum for pointing out that Dynamic Document and XP go together well
Share your comments or questions on this article in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Useless Artical
2007-10-02 21:33:19 AdnanHasan [Reply]
I this is the nmost useless article i hav eever read, writer seem to be confused.
- Additional XML Patterns
2004-02-27 10:43:11 Toivo Lainevool [Reply]
For more XML Patterns see: XMLPatterns.com
- Elements with only sub-elements
2003-06-05 07:40:07 Andres Becerra [Reply]
I am rather new at working with XSD schemas but have been catching on. This question is for something I haven't been able to find an answer to anywhere. Maybe that means that what I want to do is not possible, but I figured I would post the question anyway.
Is there a way to force an element to be able to have sub-elements, any type of sub-elements, but no text within it?
Here is the source XML I use:
<Group attrib1="..." attrib2="...">
<Title>My Title</Title>
<Content>
<!-- in here, any other element
tags are allowed, but straight
text should not be allowed -->
</Content>
</Group>
I got the basic XSD written for this XML structure and this is what it looks like:
<xs:element name="Content">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="xs:anyType">
<xs:attribute name="ContentHeight" type="xs:string" />
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
Now, using this XSD to validate a test Group.xml file, I get a message "Element cannot contain text or whitespace. Content model is empty." This is fine, I don't want the Content tag to have text or whitespace, but even when I put all the source XML on one line and ensure there is no text or whitespace, I still get that error.
If I try to do something like this:
<Group attrib1="..." attrib2="...">
<Title>My Title</Title>
<Content>
<TableView attrib1="..." attrib2="..." />
</Content>
</Group>
I get an error "Element 'Content' has invalid child element 'TableView'". This is a big problem, as I need the <Content> tag to allow any type of element, *only* elements and not straight text. After playing with the XSD a little bit, I found that allowing mixed content (i.e. see mixed="true" on <xs:complexType> below) would get rid of the "Element cannot contain text or whitespace" message. This is not my ideal approach but I can live with it for now. But with the XSD below I still get the message that "Element 'Content' has invalid child element 'TableView'".
<xs:element name="Content">
<xs:complexType mixed="true">
<xs:complexContent>
<xs:restriction base="xs:anyType">
<xs:attribute name="ContentHeight" type="xs:string" />
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
Any pointers to get around this would be greatly appreciated.
- Elements with only sub-elements
2003-06-05 08:17:21 Kyle Downey [Reply]
Without hacking around with the schemas, it's hard for me to say for sure, but I believe the problem is in how you decided to do inheritance. You're using a restriction with base any, which means rather than allowing anything, you're taking "any" and reducing it to no content, plus the attribute you named. You want an <xs:extension/> element instead. See my follow-on article; it includes an example of inheritance-by-extension. But I think there's an easier way to do this which doesn't require any inheritance:
<complexType>
<any maxOccurs="unbounded"/>
<attribute name="ContentHeight" type="string"/>
</complexType>
Note that if your intent was to allow any element but wanted to require the ContentHeight attribute on the children rather than the <Content> element, your example would be the way to go, assuming you use <xs:extension/>.
As one more note, unless mixed="true", the default content type is "elementOnly." To be explict you could do this:
<complexType content="elementOnly">
...
</complexType>
but it should be unnecessary.
- Elements with only sub-elements
2003-06-06 07:19:16 Andres Becerra [Reply]
I came to a solution based on posts by different people. I wanted to share the solution in case anyone ran into this problem in the future. Below is the XSD I ended up using:
<xs:element name="Content">
<xs:complexType>
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="0" />
</xs:sequence>
<xs:attribute name="ContentHeight" type="xs:string" />
</xs:complexType>
</xs:element>
Another option I had was to tightly validate exactly what valid elements are allowed in <Content> elements, using this XSD schema snippet:
<xs:element name="Content">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="xs:anyType">
<xs:all>
<xs:element ref="label" minOccurs="0" />
<xs:element ref="TableView" minOccurs="0" />
</xs:all>
<xs:attribute name="ContentHeight" type="xs:string" />
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
I ended up going with the 1st option simply because it was more flexible over the long run. In the near future, there may be transformers written for new elements that can be placed in the <Content> element, so I didn't want to restrict it too much.... just as long as no plain text was allowed in the <Content> element, only sub-elements.
Cheers,
Andres
- Elements with only sub-elements
- Elements with only sub-elements
2003-04-04 03:19:36 Victor Lindesay [Reply]
Great article, Kyle, thanks.
However I would disagree with your recommendation to re-use other schemas when modelling XML. I recommend that you use your own element names when data modelling a particular domain. As long as there is a 'semantic' link, explicitly declared or otherwise, for example between my <author/> and Dublin core <creator/>, then I can easily transform my schema and my look at the world into a more interoperable markup say using XSLT. This indirection will avoid problems when so called 'standards' like Dublin Core change, which they inevitably do. Using schemas outside your physical control in mission critical applications is very risky.
- Hmmmm
2003-03-31 08:45:06 Robin Berjon [Reply]
Using xml.com to vent your dislike for Perl may amuse you, but you might want to consider learning how to write well-formed XML other than with a marshaller before going there. Your snippet involving "perlml" won't parse.
- Hmmmm
2003-03-31 09:03:50 Kyle Downey [Reply]
You're assuming I dislike Perl, which is incorrect. But thank you for pointing out the error: I neglected to put in CDATA tags because of XML-illegal characters in the text <!CDATA[ ]]>.
- Hmmmm (correction)
2003-03-31 09:07:45 Kyle Downey [Reply]
The correct format for CDATA should be <![CDATA[ ... ]]>.
- Hmmmm (correction)
- Hmmmm
