WSDL Tales From the Trenches, Part 3

August 5, 2003

Defining Data

This article is the third and final part of the WSDL Tales from the Trenches series, and in it I concentrate on the data in web services. More specifically, I examine the type definitions and element declarations in the types element of a WSDL document. Such types and elements are for use in the abstract messages, the message elements in a WSD.

WSDL does not constrain data definitions to W3C XML Schema (WXS). However, alternatives to WXS are not covered in this article: the goal of the series is to provide help and guidance with current real-world problems, and I have not seen any of the alternatives to WXS being used for web services on a significant scale to date. This may change in the future: while only the WXS implementation is discussed in the WSDL 1.1 spec, it was always the intention of the WSDL designers to provide several options. The WSDL 1.2 draft's appendix on Relax NG brings this closer to realization.

Data modeling with WXS is not for the faint-hearted. It presents a lot of pitfalls. This article will point some of these out and helps you avoid them. At the very least, it should caution you to tread carefully. I will not attempt to explain WXS. There is a wealth of good texts that do so; this article focuses on how to do basic data modeling for web services. Many of the more advanced topics are avoided.

Importing data definitions

Data may be defined directly in the types element of a document containing abstract messages. The recommended practice, however, is to import a separate document; the previous installment discussed the increased readability, extendibility and opportunities for reuse this brings.

This can be done by using WSDL's import element or by using WXS's. Although they have the same name, they are different elements as they reside in different namespaces. In order to distinguish them, I will refer to WSDL's element as wsdl:import and use xsd:import to denote that of WXS. I will explain the difference between them with examples of both mechanisms.

stockquote.wsdl uses wsdl:import to import another WSDL document that only contains data definitions. In other words, the only top level element is types.

The 2nd variant does not import a WSDL document but a schema. In order to do so, it must use xsd:import as a child element of schema, which, in turn is a child of types.

Note that the WSDL 1.1 specification's example 2, a stockquote service, does not do either of these: it uses wsdl:import to import a schema at top level. However, WS-I (draft) basic profile clarifies the import mechanism in rules 2001 to 2004 and castigates the W3C Note for "... incorrectly show[ing] the WSDL import statement being used to import WXS definitions".

The above examples are in essence the same as those the WS-I basic profile offers as a correction to WSDL 1.1's, except that in the basic profile examples the imported and importing element have the same target namespace. In the case of xsd:import, this is wrong; the WXS spec does not allow it. In the case of wsdl:import, it is unfortunate; as pointed out in the previous installment, this is bad style and should have been disallowed.

If it takes several documents to define a schema with a single namespace, xsd:include or xsd:redefine should be used.

Schema Design Styles

This section is about what data definitions should be exposed and which should be hidden. The trade off is between the potential for reuse and a narrow interface.

Rule 2203 of the basic profile stipulates that abstract message parts, bound to a concrete message transporting an RPC invocation, should be defined using the type attribute. Rule 2204 states that abstract message parts used in document-style invocation should have an element attribute. If you are using SOAP, it is a good idea to try to stick to these rules, even though it makes a mockery of the "abstract message" doctrine. Therefore, there must be an exposed type definition for data passed as a parameter to an RPC invocation and an exposed element declaration for a document-style invocation. In the latter case, this means that the types element may well end up with mainly element declarations and little or no type declarations. It looks confusing but, as Roald Dahl's BFG said, "what I mean and what I say are entirely different things."

The Russian doll design style defines root elements globally. Elements that cannot be a document's root are defined as the need arises and so are attributes and types; these definitions are nested in the definitions that use them. Such definitions nested inside another definition are said to be local and cannot be reused in other definitions, neither by other components in the same schema nor by external components. Moreover, type definitions are anonymous and cannot be referenced.

A salami slice, on the other hand, declares all elements globally. A third design style is referred to as Venetian blinds. Venetian blinds define all types globally but only expose the elements that can be used as root element of a document.

Example 1, 2 and 3 illustrate the respective styles. All three have this instance document among their productions.

Clearly, none of these styles is optimal with respect to the trade off presented. However, it is instructive to contrast the three styles with respect to the set criteria. In a web services context, the equivalent of appearing as a root element of a document is to occur as the value of the element attribute on an abstract message part. Since neither Russian doll nor salami slice exposes types, they cannot be used if you want to do RPC style invocation. Venetian blinds, on the other hand, works with both RPC and document style invocation. Venetian blinds encourage the reuse of types since it defines them all globally. However, some types may not be intended for reuse while their global definition makes the interface less narrow.

For a document style web service, Russian doll could not be improved upon if the only objective were a narrow interface. It does not score well on the reuse front though. Salami slice sits at the other end of this spectrum with a high score for reuse and a low one for narrowness.

Namespaces

Namespaces were discussed briefly in the previous installment. There we asked the question, what goes into the WSD's target namespace. Here I address the question what goes into a W3C XML Schema namespace. The rules were briefly reviewed in the previous article, but here we go into more detail with the aid of some examples.

Elements, types and attributes that belong to a namespace are said to be qualified. The declaration of a target namespace is a necessary, but not sufficient condition for elements, types and attributes to be qualified. So when are they qualified and when unqualified?

Let us deal with types first, they are easy: globally defined types, both simple and complex, are always qualified. Locally defined types are anonymous and so there is no way of referencing them; the question to which namespace they belong is purely academic.

Global element declarations are also easy: globally declared elements are qualified.

To illustrate what we know so far, this instance document is validated by this schema. We see indeed that the 2 globally defined elements Element and Response are part of the target namespace; the locally defined Collection element is not.

Whether or not attributes and locally defined elements are qualified is governed by the form attribute. The attribute can take 2 values: qualified and unqualified. Therefore, in order to qualify the Collection element in our previous example, it can be reworked as so. You will find that it validates this document.

form is not a required attribute, neither when declaring attributes nor local elements. form is assigned a value implicitly, either by respectively the value of the elementAttributeDefault and attributeFormDefault attribute on the schema element, or by the default value of these attributes; the default value is unqualified in each case. So here is another schema that validates the document.

Note that the Russian doll and Venetian blinds example schemas must stipulate that elements are qualified by default in order to validate the same instance document as the salami slice example.

WSDL 1.1 recommends setting the elementFormDefault to qualified and keeping the default for attributeFormDefault. This should minimize the use of explicit namespace qualifiers if you judiciously set the schema's target namespace as the default namespace in your messages.

We have only skimmed the surface here; W3C XML Schema (see Resources for a full reference) devotes a complete chapter to controlling namespaces. However, the questions that you will most likely encounter are covered.

Compositors

W3C XML Schema has 3 compositor elements that construct complex data types from simpler ones: sequence, choice and all. Particles are nested inside compositor elements.

A sequence defines a compound structure in which the particles occur in order. The particles within a choice are mutually exclusive. However, there may be multiple occurrences of the chosen particle. all defines an unordered group. For all three compositors, the number of legal occurrences of the particles within them is governed by the maxOccurs and minOccurs attributes on those particles. These attributes are not required and their default value is 1.

The simplest particle is an element. sequence and choice can both act as particles too. all cannot.

The sequence compositor is the one that is most often encountered in WSDs. This seems a good choice; even if, conceptually, particles could occur in any order, nailing down the order will make parsing of messages that bit easier. However, implementations often do not observe the order constraints. This can be shown by invoking a web service with elements in a different order from the one laid down by a sequence: it often does not seem to matter. That is not such a bad thing. After all, if the server is more liberal in what it accepts than it strictly needs to be, this does not harm well-behaved clients and it offers some margin for error on more sloppily implemented clients. In other words, a server that did this can hardly be accused of being in breach of contract. Not so if the server cannot guarantee the order of the particles that are being sent back. Faced with such a server implementation, I spent a good deal of time working through the ramifications of this once upon a time.

The first reflex is to replace sequence with all compositors. However, be aware that the remedy is not without its problems since the expressiveness of this compositor has been severely curtailed in the WXS spec. A detailed account of why this is so and what the precise constraints are, is beyond the current scope. However, the main limitation has already been pointed out: all cannot be used as a particle. Since derivation by extension in effect uses the compositor of the base type as a particle in the subtype, opportunities for reuse of types defined with all are limited. Derivation is covered in further detail in a dedicated section.

Schema versions

The current WXS Recommendation is 1.0 and its namespace is http://www.w3.org/2001/XMLSchema. However, some implementations still being used today follow the specifications of previous working drafts, e.g. http://www.w3.org/1999/XMLSchema. This is unfortunate and the perpetrators should be encouraged to migrate to the released standard, but if you should come across such implementations, here are two of the common pitfalls. Firstly, there is a WXS data type in common use that has changed from the 1999 to the 2001 version: 1999's timeInstant became 2001's dateTime. Make sure that the data type you use fits the version of WXS. Secondly, derivations also changed significantly between 1999 and 2001. These will be covered in the following section.

Derivations

Derivation is a technique to define subtypes of a given base type. There are two kinds of derivation in WXS: extension and restriction. The former adds components at the end of the content model of the base type, the latter constrains the base type. Hence valid instances of a subtype derived by extension are not necessarily valid instances of the base type. Valid instances of a subtype derived by restriction, on the other hand, are always valid instances of the base type.

A subtype may be used anywhere where its base type is used, unless otherwise specified. This may have the following impact on message definitions: assume that a message definition declares a part with type Foo, and Bar is derived by extension from Foo. A party may send an element of type Bar in such message. The recipient may be unable to validate this message. Fortunately, it is possible to turn off the ability to substitute subtypes for base types by using the block attribute on the base type or on an element declared to be of a given base type.

Beware of derivation by extension, that is the message of this section so far. But what with derivation by restriction? From the discussion so far, it seems reasonable enough. However, using it may seem less attractive if the need is realized to list each particle of the content model of the subtype explicitly. This makes for very verbose definitions. It also does not bring the modularity benefits that an inheritance hierarchy in an OO programming language might bring: common features are not factored out, but must be repeated in each subtype. This is a change w.r.t. W3C XML Schema 1999 that caused a good deal of confusion.

Arrays

Defining an array is one of the most confusing issues in WSDL. It has also caused a great deal of interoperability problems. Proceed with caution; a common approach is to extend the Array type defined in the SOAP encoding schema. In fact, this is mandated by WSDL 1.1 (see section 2.2). I was therefore surprised to see that the rules 2110 through to 2112 of the WS-I Basic Profile Working Group overrules this. On the other hand, I understand their position: WSDL 1.1 makes a pig's ear of array specifications. The basic profile's approach, on the other hand, is simple.

When I originally planned this article, it was my intention to write a good deal about SOAP arrays, how to use them in WSDs that are as near correct as is possible given the flaws in WSDL 1.1. However, given the basic profile's recommendation, the sensible thing is to avoid them altogether.

Conclusions

The purpose of this article was to flag some of the issues that require attention when modeling data. You should be underestimate neither the importance of defining data nor the complexity of the task. It is important because the data passed across the web service interface largely determine the quality of the interface. It is complex because data modeling is inherently complex. Nonetheless, I cannot help feeling that XML W3C Schema 1.0 does not mitigate this complexity adequately. I look forward to tools better suited to data modeling for web services.

Resources

The W3C has published two normative documents on the XML Schema: XML Schema Part 1: Structures and XML Schema Part 2: Datatypes. There is also a non-normative primer.

XML Schema by Eric van der Vlist, published by O'Reilly, 2002, proved to be an invaluable companion in my encounters with W3C XML Schema. Warmly recommended to anyone who is serious about data modeling with WXS.

xFront has an item on global versus local element and type declarations in its excellent best practices section. While you are browsing the xFront, do have a look at what they have to say about web services as well, which is controversial and thought-provoking.