Profiling XML Schema
by Paul Kiel
|
Pages: 1, 2, 3
The Results: A Profile of XML Schema
There is a clear tendency for simplicity based on the usage patterns of the 1,414 schemas tested. There are just six design features used in at least one third of the schemas. However 17 features only occur in 10 percent or less of the test cases. Many of the avoided constructs would only be used in very specific situations with a special need. In addition to simplicity, explicitness is a secondary pattern, reflected in the high level of element namespace qualification, the lack of mixed content models and abstract types, and in the overwhelming preference for xsd:sequence over xsd:all compositors.
Features Avoided (Occurring in Less than 10 Percent of the Schemas)
These XML Schema design features were either used minimally or not at all in the schemas tested.
xsd:all: The use of thexsd:allcompositor. The clear preference is to usexsd:choiceorxsd:sequenceinstead.- Finalizing: The use of the
@finalor@finalDefaultattributes. None of the tested schemas used these. The test cases seem to focus on enabling features rather than disabling ones such as these. substitutionGroup: Allows elements to be substituted for other elements. While this feature was not used in the schemas tested, it is a common extension mechanism. The lack ofsubstitutionGroups is probably due to the nature of the test cases. Open, standard domain consortia schemas may not need this feature, although organizations implementing them may needsubstitutionGroups for extensibility.- Uniqueness: The use of the
uniqueelement requiring the contents to be distinctive within its scope. I found this interesting, as the need for unique IDs was common; however, most often simple strings were used for that data. Perhaps uniqueness is enforced in the business layer of a data transfer between systems. - Qualified attributes: This is the use of
attributeFormDefault="qualified". Almost none of the schema designers felt the need to do this, although the vast majority qualified the elements. - Keys: The use of the
keyandkeyrefelements. - Redefine: Using the
redefineelement to change the definition of an existing component. This feature is also among the least supported in tools. Consistently avoided. - Nillable: The use of the
@nillableattribute, allowing the use ofxsi:nilin the instance indicating the contents have a null value. - Block: The use of the
@blockattribute to disallow derivations. complexTyperestriction: Restricting the content model of acomplexType. Years ago at HR-XML, we looked at this feature to enable us to have a generalized data type that is constrained depending on the context of its usage. However, we found restrictingcomplexTypes to be cumbersome, verbose, and not well supported at that time.- Abstract types: The use of
abstract="true"on elements or types. - Mixed: Setting the attribute
mixed="true"combines data and child elements in one place. Schema designers have clearly separated these concepts into distinct types. - Groups: The use of
xsd:groupis a way to define a group for later reuse. It may be that elements were simply referred to with"@ref"rather than put into groups. - Fixed values: The
@fixedattribute on elements, attributes, orsimpleTypes. - No
@targetNamespaceattribute: Having no@targetNamespacemay be used in late binding of schemas. However, the vast majority of test cases used it. I have almost always seen it as a requirement in schema design guides. - No default namespace
declared: Containing no
"@xmlns"default namespace (no prefix). Again, this may be used in late binding. Default namespaces were consistently used in the test cases. - Default namespace not
equal to
@targetNamespace(PDF): This occurs when the default namespace value does not match the@targetNamespace. Not only did the vast majority of schemas have both a default and a@targetNamespace, but they were the same value. This reflects the tendency toward simplicity.
Features Used Frequently (Occurring in at Least One-Third of the Schemas)
The most commonly used XML Schema design features. Here again, simplicity rules. It is wise to begin schema design with this toolset.
- Namespace qualified
elements: Use
elementFormDefault="qualified"for explicitness of the element namespaces. xsd:sequence: The use of thexsd:sequenceelement. The most common compositor. I've recommended this overxsd:allbecause it leads to fewer ambiguous content models and its child elements occur in a predictable order.complexTypeextension: Creating a type that extends another is a key reusability and extensibility point.- Anonymous types: These occur when types are created that are locally scoped and thus have no
"@name"attribute. Very commonly done. Tools may prefer no anonymous types, but none I have tried was unable to accommodate them. simpleTyperestriction: Derivation of asimpleTyperestricting its base type.- Enumerations: Enumerated values were the most frequently used construct.
Problems in the Middle
These XML Schema design features were commonly used but may have problematic tool support. It is a good idea to check with the tools you plan to use before adding these features to your design.
attributeGroup: A grouping of attributes by name for reusability, similar toxsd:group. This feature may have shown up in the test cases more than is actually used. First, one organization used them heavily, obscuring the fact that the others tended to avoid them. Second, the analysis searched for "xsd:attributeGroup" resulting in matches for both declarations and reuse or "@ref". So the actual number of these may be much smaller. My experience with tools was thatattributeGroups were not hard to support, but simply weren't the highest priority.xsd:choice: The use of thexsd:choicecompositor. Some tool makers have expressed concern about this feature because it is not easily mapped to a programming construct. However, its usage is common.- Default values: Declaring default values for data in the XML instance. I've blogged about default values before.
xsd:union: The use ofxsd:unionto combine types in a declaration. I've found this is the least supported feature of XML Schema in tools.- Pattern: Uses regular expressions to subset strings, among other things.
- Other facets: Includes facets other than pattern and enumeration, namely
minInclusive,maxInclusive,maxInclusive,minExclusive, whitespace,fractionDigits,length,minLength, andmaxLength. Again, tool support varies. - List types: The use of the
xsd:listelement. This feature only occurred in about 10 percent of the test cases. It is sometimes unsupported in tools and may be a cause of concern. I've heard many complaints from coders about programming to parse through and process list types. They much preferred the use of enumerations or separate data types.
A Note on Wildcards
These were in the middle of the list; however, I suspect that they are actually used much more frequently. Some of these consortia create a single wildcard extension element which is simply referred to (with "@ref") as needed. So the actual number of wildcard elements is lower than the usage of those elements.