XML.com 
 Published on XML.com http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html
See this if you're having trouble printing code examples

 

Profiling XML Schema
By Paul Kiel
September 20, 2006

XML Schema is now 5 years old, having matured from a newborn into an active youngster. So what have we learned about this young one's personality? We've always known it was complex. Indeed, the original debate about whether to make it a Recommendation indicated concern. (See Last Word and Questionnaire.) This rich toolset has caused schema designers to wonder which features they should or should not use. If we analyze what people are actually implementing, perhaps we can glean some guidance. I decided to embark on a quest to see if we can put together a profile of XML Schema based on experiences thus far.

Background

A profile is a set of agreed upon practices reflecting the most commonly accepted usage patterns of a given technology. A usage profile of XML Schema would indicate a series of features that are commonly implemented and supported in tools.

The concept of a profile of XML Schema has been debated over the years. In 2004, the Web Services Interoperability Consortium went so far as to charter a work group (PDF) to examine the idea of a formal profile for XML Schema. It led the W3C to hold a Workshop on XML Schema 1.0 User Experiences, where input from many sources was culled into a plan of action. The resultant work in the XML Schema Patterns for Databinding Working Group is ongoing, having recently issued a draft document.

In addition, many industry consortia have issued design guidelines and/or patterns for developing libraries of schemas according to their profile. Having read many of these either formally or informally, they are often explicit about what features of XML Schema are allowed or disallowed. Indeed, tools for addressing enforcement of schema profiles have emerged. Schematron is often used to add additional constraints on top of the ones in a schema. Mindreef's SOAPScope Server has explicit support for creating customizable profiles of schema, offering a standard check box listing of constructs that are enforced in test cases. However, I wondered if there was a cross-industry usage profile.

Strengths and Weaknesses

The first stop in the search for a profile is with an analysis of schema itself. In fact, for some time now, we've known about the pluses and minuses of many schema design techniques, as put together by Roger Costello on his Best Practices website. He has done an excellent job of gathering commentary and opinion from implementers, and assembling it into a cohesive analysis of the benefits and drawbacks of many design criteria. Wise schema designers have referred to this site for years.

On a practical level, schema designers may also wonder what impact each feature of XML Schema has down the road when it comes time for implementation. Some constructs are ubiquitous and well supported in IDEs and other tools such as code generating software. However, there are some that are not commonly used and can cause problems when coders are confronted with a lack of tool support.

What Are Schema Designers Actually Doing?

In the next phase of my profile quest, I wanted to take what Costello has done a step further and ask: what features of XML Schema are folks actually using? Is there a consensus of opinion on the most common constructs? Are there features schema designers are avoiding? I accumulated data on over 1,400 schemas from numerous standards consortia to see if there is a common XML Schema profile reflecting a consensus of practice.

I focused on consortia schemas thinking that these should reflect a group's consensus of design criteria as well as have a disproportionate impact on the marketplace, as they are standard and implemented many times across a domain. These schemas are also all freely available.

The Sources

I examined schemas from the following organizations:

There are many other consortia that could, and with infinite time would, be added to this analysis.

Tool Support

Many of the more mature tools have a high level of support for XML Schema features. In particular, IDEs that edit schemas have good support even for problematic features such as xsd:union elements. The problem with tool support comes in two forms. First, some have chosen not to support selected schema constructs. This amounts to a profile by design. Secondly, "best-of-breed" tools can offer support only for the most common schema constructs in their early releases. As the tool matures, it may plan to add support for additional features. I've blogged about tool support of XML Schema and shown references to a few code-generation tools and their publicly available claims of support.

The Results: A Profile of XML Schema

There is a clear tendency for simplicity based on the usage patterns of the 1,414 schemas tested. There are just six design features used in at least one third of the schemas. However 17 features only occur in 10 percent or less of the test cases. Many of the avoided constructs would only be used in very specific situations with a special need. In addition to simplicity, explicitness is a secondary pattern, reflected in the high level of element namespace qualification, the lack of mixed content models and abstract types, and in the overwhelming preference for xsd:sequence over xsd:all compositors.

Features Avoided (Occurring in Less than 10 Percent of the Schemas)

These XML Schema design features were either used minimally or not at all in the schemas tested.

Features Used Frequently (Occurring in at Least One-Third of the Schemas)

The most commonly used XML Schema design features. Here again, simplicity rules. It is wise to begin schema design with this toolset.

Problems in the Middle

These XML Schema design features were commonly used but may have problematic tool support. It is a good idea to check with the tools you plan to use before adding these features to your design.

A Note on Wildcards

These were in the middle of the list; however, I suspect that they are actually used much more frequently. Some of these consortia create a single wildcard extension element which is simply referred to (with "@ref") as needed. So the actual number of wildcard elements is lower than the usage of those elements.

Conclusion

Examining what schema designers are actually implementing can indeed reveal a usage profile of XML Schema. It is in this profile of practice that our five-year-old's personality emerges. The clearest message is one of simplicity. The most commonly used constructs involve merely creating reusable types, assembling them into sequences of elements, and augmenting them with enumerations. Many of the more complex features went unused. In addition, the test cases also reflected explicitness in their schemas, as evidenced in the avoidance of mixing or abstracting content and the qualifying of element form defaults. Adhering to the design patterns reflected in this usage profile will serve schema designers well.

Appendix: The Data

The data in these tables indicate the results of my research. They were all downloaded in early September 2006 from their respective websites (many of them are listed here). Figure 1 is a summary, Figure 2 indicates how many schemas contain the XML Schema design feature listed, and Figure 3 shows the number of times the feature occurred.

Summary of data.
Figure 1. Summary of data

Figure 2
Figure 2. Number of schemas using XML Schema features. (Click for full-size image)

Figure 3
Figure 3. Number of occurrences of XML Schema features. (Click for full-size image)

Figure Notes

A few duplicative schemas were removed from the analysis, such as the schema for schemas (XMLSchema.xsd), which was commonly distributed with many libraries. ACORD also offers no namespace equivalents of their schemas. For this analysis, the namespaced versions were used. In both the HR-XML and OAGi test files, the developer or "non-standalone" versions of the schemas were analyzed. While there are no substitutionGroups in the OAGi schemas, the global element design is intended to enable substitutions as an extension point. The W3C list of schemas includes mathML.

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.