Designing a New Schema with XML Design Patterns
Each component model allows you to write components in many languages, but these binary formats are friendly to just one language: C. In this article we're going to start a project to design an XML format that could be generated from IDL, TLB, XDT or any other representation of a portable component and then be read by any language that supports XML and further manipulated to generate GUI tools, documentation, and more.
We could imagine an instance document like this that
defines an interface Hello with a single operation,
string sayHello(in string personName):
<tlx:typelib xmlns:tlx="http://schema.amberarcher.com/polaris/tlx.html"> <tlx:interface name="GreetingFactory" id="8bb35ed9-e332-462d-9155-4a002ab5c958"> <tlx:operation name="sayHello" type="string"> <tlx:in name="personName" type="string"/> </tlx:operation> </tlx:interface> </tlx:typelib>
The Dynamic Document pattern, which suggests not writing a schema and just following the data structure of the program, doing the parsing using .NET Marshalling or the JavaBeans long-term persistence from JDK 1.4, could let us end the article right here. It should be clear immediately this option is right out: the point of the project is to create a stable, portable format, and we don't want it influenced by changes in the language we use to parse or generate it. Also, unlike Ant, say, this is a fairly self-contained project, and we don't need to allow arbitrary people to write extensions without having to incorporate them into the schema.
There are two types of composition: reuse of your own schema elements and reuse of other people's schema elements. We'll endeavor to do both for this project. To start, we need some requirements:
Requirement (1) is core: it wouldn't be a type library without defining the interfaces and operations, and we can't borrow this from anywhere else. Requirement (2) is also central, but one thing we have to consider for composition is who controls the schema and how fast it changes. The idea of an interface and a method call will not change over the lifetime of this schema, but vendors come and go. We should put Mozilla and Microsoft specific information in their own schemas and compose them into the larger schema.
Requirement (3) screams for some petty schema larceny. Documentation nowadays should be hypertext and convertible into other formats. Both DocBook/XML and XHTML's more basic elements fit the bill. Since HTML is familiar to most programmers, we'll choose to compose XHTML.
Finally, requirement (4) could be filled with RDF and the Dublin Core. There's no point in reinventing file metadata, and there are plenty of tools out there for reading RDF. So with our tools chosen, let's go back to our first example, and mark up the document using all our borrowed schemas:
<tlx:typelib xmlns:tlx="http://schema.amberarcher.com/polaris/tlx.html"> <!-- for generated code, we mostly want to know what version of the program created it, when it was last (re)generated, and what was the original source; Dublin Core offers all of these for our use --> <tlx:metadata> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description> <dc:creator>xpidl2tlx 1.0</dc:creator> <dc:date>2003-05-02</dc:date> <dc:source>file:c:/oss/mozilla/modules/libreg/ xpcom/nsIRegistry.idl</dc:source> </rdf:Description> </rdf:RDF> </tlx:metadata> <tlx:interface name="GreetingFactory" id="8bb35ed9-e332-462d-9155-4a002ab5c958"> <!-- documentation is nested inside the object being documented; the text chosen here is from the javadoc-type description in the original IDL --> <tlx:documentation> <xhtml:p xmlns:xhtml="http://www.w3.org/1999/xhtml"> Interface for creating custom greetings. </xhtml:p> </tlx:documentation> <tlx:operation name="sayHello" type="string"> <tlx:documentation> <xhtml:p xmlns:xhtml="http://www.w3.org/1999/xhtml"> Returns a greeting that incorporates the person's name. </xhtml:p> </tlx:documentation> <tlx:in name="personName" type="string"> <!-- documentation is nested inside the parameter being documented --> <tlx:documentation> <xhtml:p xmlns:xhtml="http://www.w3.org/1999/xhtml"> name of the person to greet </xhtml:p> </tlx:documentation> </tlx:in> </tlx:operation> </tlx:interface> </tlx:typelib>
What have we gained? It seems like a lot of added complexity and bulk. As in the original pattern catalog, the reason we use XHTML or RDF when we have a need for their services is to get ourselves out of the business of defining tags in that domain. We care about typelibs, not hypertext. If someone wants to put in a hyperlink in the documentation, we don't need to define a tag. XHTML has it already. Also, by using a well-known schema, it becomes easier to use shared tools: an RDF-aware document-management system would immediately recognize the creation date and creator, while it might ignore that extra information if it were buried in our own "create-date" and "creator" tags.
We've also satisfied another common schema pattern along the way: we've made our format self-documenting. While not child's play, it's still much easier to write a stylesheet to convert the above document to HTML than it is to, say, parse XPIDL and javadoc-style tags to produce it. A GUI browser tool can also provide human descriptions for cryptically-named functions and parameters.
Our last architectural pattern is Multipart Files. This pattern suggests we should offer our document creators a way to compose a single, coherent document out of many. Type libraries define interfaces, and interfaces can inherit from other interfaces. We may thus want to define our common base interfaces in one file, much like we'd define and #include a header in C or C++. We have two choices for implementing this feature: borrow or invent. We're on a roll borrowing, so here's what it would look like using XInclude, the W3C Candidate Recommendation for this purpose:
<tlx:typelib xmlns:tlx="http://schema.amberarcher.com/polaris/tlx.html" xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="base.tlx"/> <!-- ... --> </tlx:typelib>
We've gone from a blank sheet of paper and a couple requirement,s to sketches of the file format we want to support, plus a set of patterns to guide us: Multipart Files, Self-Documenting Formats, and Composition. There are a number of capable XML Schema tutorials out there, so the focus here will be on XML Schema details to support incorporating other schemas into our design.
We're composing three different schemas for this design, so we need to import them so we can refer to their elements.
<schema targetNamespace= "http://schema.amberarcher.com/polaris/tlx.html" xmlns:tlx="http://schema.amberarcher.com/polaris/tlx.html" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <import namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#" schemaLocation="http://dublincore.org/documents/2002/07/31/ dcmes-xml/dcmes-rdf.xsd"/> <import namespace="http://www.w3.org/1999/xhtml" schemaLocation="http://schema.amberarcher.com/polaris/w3c/ xhtml1.1.xsd"/> <import namespace="http://www.w3.org/2001/XInclude" schemaLocation="http://schema.amberarcher.com/polaris/w3c/ xinclude.xsd"/>
For the W3C schemas, the first snag I hit was that there's not yet an authoritative URI for either the XHTML 1.1 or XInclude XML schemas. I had to download the zip files that come with the specifications and put them up on my server. Since these are snapshots of the versions at this time, I do not recommend that you reuse my URLs (my ISP will thank you as well); they may not stay up-to-date.
With the full contents of these three namespaces at our
disposal, we can reference them. Let's define our top-level
typelib, and say it's a sequence of two
optional elements and an unbounded set of one or more
interfaces. That looks like this:
<element name="typelib"> <complexType> <sequence> <element ref="tlx:metadata" minOccurs="0"/> <element ref="xi:include" minOccurs="0"/> <element ref="tlx:interface" maxOccurs="unbounded"/> </sequence> </complexType> </element> <element name="metadata"> <complexType> <sequence> <element ref="rdf:RDF"/> </sequence> </complexType> </element>
<element ref="xxx:yyy"> whenever we
want to link to another schema element. We have to define a
namespace prefix (xi or rdf in this case) to scope the name,
and then we can pick any element in that schema. Note that we
don't necessarily have to choose the top-level element, as
we'll see in the next section.
Implementing multi-part documents on the schema side was just
one line to include
Supporting it on the document processing side may be easy if
your XML Parser has native XInclude support, or you may have
to write it yourself using one of the known
implementations. There are versions for Java, .NET, and
What I intended to do with XHTML was to reuse its
<p> as allowed
container elements for documentation so that our embedded
documentation could have hyperlinks and other mark-up. This
can be done with W3C XML Schema like this:
<element name="documentation"> <complexType> <choice> <element ref="xhtml:p"/> <element ref="xhtml:div"/> </choice> </complexType> </element>
Easy, right? I completed the schema, validated the W3C XML Schema code itself without trouble, and then tried to validate an instance document using XSV. I got an explosion of validation errors, all from what seemed to be the XHTML 1.1 schema itself; listserve and newsgroup messages on the topic seemed to indicate that this problem has been around a while. I tried to find variants of the schema published elsewhere, but the only one I could dig up was based on a draft version of W3C XML Schema.
Although the XHTML working group has moved on to XHTML 2.0, my understanding is that the finished work for 1.1 is in the DTD, while Modularization of XHTML in XML Schema is still just a W3C working draft as of 9 December 2002. I've written the lead author of the specification and at the time of writing have yet to hear back. I think the lesson for anyone applying Composition is to be very careful about choosing stable specifications. Don't make assumptions about the availability of a compliant W3C XML Schema for any incomplete specification until you've tested it yourself. The flip-side of being able to rely upon someone else to define the elements for your problem domain is that you have to depend upon them.
For now the test schema for TLX has the XHTML import commented out and the documentation tag is declared more simply as:
<element name="documentation" type="string"/>
With these changes to the test schema, and changes to the example document, XSV blessed the document as valid.
Even though I'm not yet able to support validation and
embedded XHTML in the documentation at the same time, there's
one final nice touch to put in the schema. A number of
elements can have embedded documentation: interfaces,
operations, and parameters. We can use inheritance in XML
Schema to prevent duplication in our new schema. First we
documentableType. We declare it abstract because
we don't want any elements that directly use this type, just
<complexType name="documentableType" abstract="true"> <sequence minOccurs="0"> <element ref="tlx:documentation"/> </sequence> </complexType>
To finish we just have to derive the complexTypes for each of our
documentableType using W3C XML Schema's
<element name="interface"> <complexType> <complexContent> <extension base="tlx:documentableType"> <sequence> <element ref="tlx:operation" maxOccurs="unbounded"/> </sequence> <attribute name="name" type="string"/> <attribute name="id" type="string" use="optional"/> </extension> </complexContent> </complexType> </element>
When you set out to design your own XML Schema, you do not need to start from scratch. You can use either patterns exemplified by the growing body of working schemas on the Internet, from the W3C to OASIS, or you can directly reuse their elements through Composition. It's worth the effort, because you can get the value of a language specific to your domain without the trouble of writing your own parser, and the end result can be used across multiple languages with ease.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.