XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Profiling XML Schema
by Paul Kiel | Pages: 1, 2, 3

The Results: A Profile of XML Schema

There is a clear tendency for simplicity based on the usage patterns of the 1,414 schemas tested. There are just six design features used in at least one third of the schemas. However 17 features only occur in 10 percent or less of the test cases. Many of the avoided constructs would only be used in very specific situations with a special need. In addition to simplicity, explicitness is a secondary pattern, reflected in the high level of element namespace qualification, the lack of mixed content models and abstract types, and in the overwhelming preference for xsd:sequence over xsd:all compositors.

Features Avoided (Occurring in Less than 10 Percent of the Schemas)

These XML Schema design features were either used minimally or not at all in the schemas tested.

  • xsd:all: The use of the xsd:all compositor. The clear preference is to use xsd:choiceor xsd:sequence instead.
  • Finalizing: The use of the @final or @finalDefault attributes. None of the tested schemas used these. The test cases seem to focus on enabling features rather than disabling ones such as these.
  • substitutionGroup: Allows elements to be substituted for other elements. While this feature was not used in the schemas tested, it is a common extension mechanism. The lack of substitutionGroups is probably due to the nature of the test cases. Open, standard domain consortia schemas may not need this feature, although organizations implementing them may need substitutionGroups for extensibility.
  • Uniqueness: The use of the unique element requiring the contents to be distinctive within its scope. I found this interesting, as the need for unique IDs was common; however, most often simple strings were used for that data. Perhaps uniqueness is enforced in the business layer of a data transfer between systems.
  • Qualified attributes: This is the use of attributeFormDefault="qualified". Almost none of the schema designers felt the need to do this, although the vast majority qualified the elements.
  • Keys: The use of the key and keyref elements.
  • Redefine: Using the redefine element to change the definition of an existing component. This feature is also among the least supported in tools. Consistently avoided.
  • Nillable: The use of the @nillable attribute, allowing the use of xsi:nil in the instance indicating the contents have a null value.
  • Block: The use of the @block attribute to disallow derivations.
  • complexType restriction: Restricting the content model of a complexType. Years ago at HR-XML, we looked at this feature to enable us to have a generalized data type that is constrained depending on the context of its usage. However, we found restricting complexTypes to be cumbersome, verbose, and not well supported at that time.
  • Abstract types: The use of abstract="true" on elements or types.
  • Mixed: Setting the attribute mixed="true" combines data and child elements in one place. Schema designers have clearly separated these concepts into distinct types.
  • Groups: The use of xsd:group is a way to define a group for later reuse. It may be that elements were simply referred to with "@ref" rather than put into groups.
  • Fixed values: The @fixed attribute on elements, attributes, or simpleTypes.
  • No @targetNamespace attribute: Having no @targetNamespace may be used in late binding of schemas. However, the vast majority of test cases used it. I have almost always seen it as a requirement in schema design guides.
  • No default namespace declared: Containing no "@xmlns" default namespace (no prefix). Again, this may be used in late binding. Default namespaces were consistently used in the test cases.
  • Default namespace not equal to @targetNamespace (PDF): This occurs when the default namespace value does not match the @targetNamespace. Not only did the vast majority of schemas have both a default and a @targetNamespace, but they were the same value. This reflects the tendency toward simplicity.

Features Used Frequently (Occurring in at Least One-Third of the Schemas)

The most commonly used XML Schema design features. Here again, simplicity rules. It is wise to begin schema design with this toolset.

  • Namespace qualified elements: Use elementFormDefault="qualified" for explicitness of the element namespaces.
  • xsd:sequence: The use of the xsd:sequence element. The most common compositor. I've recommended this over xsd:all because it leads to fewer ambiguous content models and its child elements occur in a predictable order.
  • complexType extension: Creating a type that extends another is a key reusability and extensibility point.
  • Anonymous types: These occur when types are created that are locally scoped and thus have no "@name" attribute. Very commonly done. Tools may prefer no anonymous types, but none I have tried was unable to accommodate them.
  • simpleType restriction: Derivation of a simpleType restricting its base type.
  • Enumerations: Enumerated values were the most frequently used construct.

Problems in the Middle

These XML Schema design features were commonly used but may have problematic tool support. It is a good idea to check with the tools you plan to use before adding these features to your design.

  • attributeGroup: A grouping of attributes by name for reusability, similar to xsd:group. This feature may have shown up in the test cases more than is actually used. First, one organization used them heavily, obscuring the fact that the others tended to avoid them. Second, the analysis searched for "xsd:attributeGroup" resulting in matches for both declarations and reuse or "@ref". So the actual number of these may be much smaller. My experience with tools was that attributeGroups were not hard to support, but simply weren't the highest priority.
  • xsd:choice: The use of the xsd:choice compositor. Some tool makers have expressed concern about this feature because it is not easily mapped to a programming construct. However, its usage is common.
  • Default values: Declaring default values for data in the XML instance. I've blogged about default values before.
  • xsd:union: The use of xsd:union to combine types in a declaration. I've found this is the least supported feature of XML Schema in tools.
  • Pattern: Uses regular expressions to subset strings, among other things.
  • Other facets: Includes facets other than pattern and enumeration, namely minInclusive, maxInclusive, maxInclusive, minExclusive, whitespace, fractionDigits, length, minLength, and maxLength. Again, tool support varies.
  • List types: The use of the xsd:list element. This feature only occurred in about 10 percent of the test cases. It is sometimes unsupported in tools and may be a cause of concern. I've heard many complaints from coders about programming to parse through and process list types. They much preferred the use of enumerations or separate data types.

A Note on Wildcards

These were in the middle of the list; however, I suspect that they are actually used much more frequently. Some of these consortia create a single wildcard extension element which is simply referred to (with "@ref") as needed. So the actual number of wildcard elements is lower than the usage of those elements.

Pages: 1, 2, 3

Next Pagearrow