WSDL Tales From the Trenches, Part 3
August 5, 2003
Defining Data
This article is the third and final part of the WSDL Tales from the Trenches
series, and in it I concentrate on the data in web services. More specifically, I
examine
the type definitions and element declarations in the types
element of a WSDL
document. Such types and elements are for use in the abstract messages, the
message
elements in a WSD.
WSDL does not constrain data definitions to W3C XML Schema (WXS). However, alternatives to WXS are not covered in this article: the goal of the series is to provide help and guidance with current real-world problems, and I have not seen any of the alternatives to WXS being used for web services on a significant scale to date. This may change in the future: while only the WXS implementation is discussed in the WSDL 1.1 spec, it was always the intention of the WSDL designers to provide several options. The WSDL 1.2 draft's appendix on Relax NG brings this closer to realization.
Data modeling with WXS is not for the faint-hearted. It presents a lot of pitfalls. This article will point some of these out and helps you avoid them. At the very least, it should caution you to tread carefully. I will not attempt to explain WXS. There is a wealth of good texts that do so; this article focuses on how to do basic data modeling for web services. Many of the more advanced topics are avoided.
Importing data definitions
Data may be defined directly in the types
element of a document containing
abstract messages. The recommended practice, however, is to import a separate document;
the
previous installment discussed the increased readability, extendibility and opportunities
for reuse this brings.
This can be done by using WSDL's import
element or by using WXS's. Although
they have the same name, they are different elements as they reside in different namespaces.
In order to distinguish them, I will refer to WSDL's element as wsdl:import
and
use xsd:import
to denote that of WXS. I will explain the difference between
them with examples of both mechanisms.
stockquote.wsdl uses wsdl:import
to import another WSDL
document that only contains data definitions. In other words, the only top level
element is types
.
The 2nd variant does not import a WSDL document but a schema. In order
to do so, it must use xsd:import
as a child element of schema
,
which, in turn is a child of types
.
Note that the WSDL 1.1 specification's example
2, a stockquote service, does not do either of these: it uses wsdl:import
to import a schema at top level. However, WS-I (draft) basic profile clarifies the
import
mechanism in rules
2001 to 2004 and castigates the W3C Note for "... incorrectly show[ing] the WSDL
import statement being used to import WXS definitions".
The above examples are in essence the same as those the WS-I basic profile offers
as a
correction to WSDL 1.1's, except that in the basic profile examples the imported and
importing element have the same target namespace. In the case of xsd:import
,
this is wrong; the WXS spec does not allow it. In the case of wsdl:import
, it
is unfortunate; as pointed out in the previous installment, this is bad style and
should
have been disallowed.
If it takes several documents to define a schema with a single namespace,
xsd:include
or xsd:redefine
should be used.
Schema Design Styles
This section is about what data definitions should be exposed and which should be hidden. The trade off is between the potential for reuse and a narrow interface.
Rule
2203 of the basic profile stipulates that abstract message parts, bound to a concrete
message transporting an RPC invocation, should be defined using the type
attribute. Rule
2204 states that abstract message parts used in document-style invocation should have
an element
attribute. If you are using SOAP, it is a good idea to try to stick
to these rules, even though it makes a mockery of the "abstract message" doctrine.
Therefore, there must be an exposed type definition for data passed as a parameter
to an RPC
invocation and an exposed element declaration for a document-style invocation. In
the latter
case, this means that the types
element may well end up with mainly element
declarations and little or no type declarations. It looks confusing but, as Roald
Dahl's BFG
said, "what I mean and what I say are entirely different things."
The Russian doll design style defines root elements globally. Elements that cannot be a document's root are defined as the need arises and so are attributes and types; these definitions are nested in the definitions that use them. Such definitions nested inside another definition are said to be local and cannot be reused in other definitions, neither by other components in the same schema nor by external components. Moreover, type definitions are anonymous and cannot be referenced.
A salami slice, on the other hand, declares all elements globally. A third design style is referred to as Venetian blinds. Venetian blinds define all types globally but only expose the elements that can be used as root element of a document.
Example 1, 2 and 3 illustrate the respective styles. All three have this instance document among their productions.
Clearly, none of these styles is optimal with respect to the trade off presented.
However,
it is instructive to contrast the three styles with respect to the set criteria. In
a web
services context, the equivalent of appearing as a root element of a document is to
occur as
the value of the element
attribute on an abstract message part
.
Since neither Russian doll nor salami slice exposes types, they cannot be used if
you want
to do RPC style invocation. Venetian blinds, on the other hand, works with both RPC
and
document style invocation. Venetian blinds encourage the reuse of types since it defines
them all globally. However, some types may not be intended for reuse while their global
definition makes the interface less narrow.
For a document style web service, Russian doll could not be improved upon if the only objective were a narrow interface. It does not score well on the reuse front though. Salami slice sits at the other end of this spectrum with a high score for reuse and a low one for narrowness.
Namespaces
Namespaces were discussed briefly in the previous installment. There we asked the question, what goes into the WSD's target namespace. Here I address the question what goes into a W3C XML Schema namespace. The rules were briefly reviewed in the previous article, but here we go into more detail with the aid of some examples.
Elements, types and attributes that belong to a namespace are said to be qualified. The declaration of a target namespace is a necessary, but not sufficient condition for elements, types and attributes to be qualified. So when are they qualified and when unqualified?
Let us deal with types first, they are easy: globally defined types, both simple and complex, are always qualified. Locally defined types are anonymous and so there is no way of referencing them; the question to which namespace they belong is purely academic.
Global element declarations are also easy: globally declared elements are qualified.
To illustrate what we know so far, this instance
document is validated by this schema. We see
indeed that the 2 globally defined elements Element
and Response
are part of the target namespace; the locally defined Collection
element is
not.
Whether or not attributes and locally defined elements are qualified is governed by
the
form
attribute. The attribute can take 2 values: qualified
and
unqualified
. Therefore, in order to qualify the Collection
element in our previous example, it can be reworked as so. You will find
that it validates this document.
form
is not a required attribute, neither when declaring attributes nor local
elements. form
is assigned a value implicitly, either by respectively the value
of the elementAttributeDefault
and attributeFormDefault
attribute
on the schema
element, or by the default value of these attributes; the default
value is unqualified
in each case. So here is another
schema that validates the document.
Note that the Russian doll and Venetian blinds example schemas must stipulate that elements are qualified by default in order to validate the same instance document as the salami slice example.
WSDL 1.1 recommends setting the elementFormDefault
to qualified
and keeping the default for attributeFormDefault
. This should minimize the use
of explicit namespace qualifiers if you judiciously set the schema's target namespace
as the
default namespace in your messages.
We have only skimmed the surface here; W3C XML Schema (see Resources for a full reference) devotes a complete chapter to controlling namespaces. However, the questions that you will most likely encounter are covered.
Compositors
W3C XML Schema has 3 compositor elements that construct complex data types from
simpler ones: sequence
, choice
and all
.
Particles are nested inside compositor elements.
A sequence defines a compound structure in which the particles occur in order. The
particles within a choice are mutually exclusive. However, there may be multiple
occurrences of the chosen particle. all defines an unordered group. For all three
compositors, the number of legal occurrences of the particles within them is governed
by the
maxOccurs
and minOccurs
attributes on those particles. These
attributes are not required and their default value is 1.
The simplest particle is an element
. sequence
and
choice
can both act as particles too. all
cannot.
The sequence
compositor is the one that is most often encountered in WSDs.
This seems a good choice; even if, conceptually, particles could occur in any order,
nailing
down the order will make parsing of messages that bit easier. However, implementations
often
do not observe the order constraints. This can be shown by invoking a web service
with
elements in a different order from the one laid down by a sequence: it often does
not seem
to matter. That is not such a bad thing. After all, if the server is more liberal
in what it
accepts than it strictly needs to be, this does not harm well-behaved clients and
it offers
some margin for error on more sloppily implemented clients. In other words, a server
that
did this can hardly be accused of being in breach of contract. Not so if the server
cannot
guarantee the order of the particles that are being sent back. Faced with such a server
implementation, I spent a good deal of time working through the ramifications of this
once
upon a time.
The first reflex is to replace sequence
with all
compositors.
However, be aware that the remedy is not without its problems since the expressiveness
of
this compositor has been severely curtailed in the WXS spec. A detailed account of
why this
is so and what the precise constraints are, is beyond the current scope. However,
the main
limitation has already been pointed out: all
cannot be used as a particle.
Since derivation by extension in effect uses the compositor of the base type as a
particle
in the subtype, opportunities for reuse of types defined with all
are limited.
Derivation is covered in further detail in a dedicated section.
Schema versions
The current WXS Recommendation is 1.0 and its namespace is
http://www.w3.org/2001/XMLSchema
. However, some implementations still being
used today follow the specifications of previous working drafts, e.g.
http://www.w3.org/1999/XMLSchema
. This is unfortunate and the perpetrators
should be encouraged to migrate to the released standard, but if you should come across
such
implementations, here are two of the common pitfalls. Firstly, there is a WXS data
type in
common use that has changed from the 1999 to the 2001 version: 1999's
timeInstant
became 2001's dateTime
. Make sure that the data type
you use fits the version of WXS. Secondly, derivations also changed significantly
between
1999 and 2001. These will be covered in the following section.
Derivations
Derivation is a technique to define subtypes of a given base type. There are two kinds of derivation in WXS: extension and restriction. The former adds components at the end of the content model of the base type, the latter constrains the base type. Hence valid instances of a subtype derived by extension are not necessarily valid instances of the base type. Valid instances of a subtype derived by restriction, on the other hand, are always valid instances of the base type.
A subtype may be used anywhere where its base type is used, unless otherwise specified.
This may have the following impact on message definitions: assume that a message definition
declares a part with type Foo
, and Bar
is derived by extension
from Foo
. A party may send an element of type Bar
in such message.
The recipient may be unable to validate this message. Fortunately, it is possible
to turn
off the ability to substitute subtypes for base types by using the block
attribute on the base type or on an element declared to be of a given base type.
Beware of derivation by extension, that is the message of this section so far. But what with derivation by restriction? From the discussion so far, it seems reasonable enough. However, using it may seem less attractive if the need is realized to list each particle of the content model of the subtype explicitly. This makes for very verbose definitions. It also does not bring the modularity benefits that an inheritance hierarchy in an OO programming language might bring: common features are not factored out, but must be repeated in each subtype. This is a change w.r.t. W3C XML Schema 1999 that caused a good deal of confusion.
Arrays
Defining an array is one of the most confusing issues in WSDL. It has also caused a great deal of interoperability problems. Proceed with caution; a common approach is to extend the Array type defined in the SOAP encoding schema. In fact, this is mandated by WSDL 1.1 (see section 2.2). I was therefore surprised to see that the rules 2110 through to 2112 of the WS-I Basic Profile Working Group overrules this. On the other hand, I understand their position: WSDL 1.1 makes a pig's ear of array specifications. The basic profile's approach, on the other hand, is simple.
When I originally planned this article, it was my intention to write a good deal about SOAP arrays, how to use them in WSDs that are as near correct as is possible given the flaws in WSDL 1.1. However, given the basic profile's recommendation, the sensible thing is to avoid them altogether.
Conclusions
The purpose of this article was to flag some of the issues that require attention when modeling data. You should be underestimate neither the importance of defining data nor the complexity of the task. It is important because the data passed across the web service interface largely determine the quality of the interface. It is complex because data modeling is inherently complex. Nonetheless, I cannot help feeling that XML W3C Schema 1.0 does not mitigate this complexity adequately. I look forward to tools better suited to data modeling for web services.
Resources
The W3C has published two normative documents on the XML Schema: XML Schema Part 1: Structures and XML Schema Part 2: Datatypes. There is also a non-normative primer.
XML Schema by Eric van der Vlist, published by O'Reilly, 2002, proved to be an invaluable companion in my encounters with W3C XML Schema. Warmly recommended to anyone who is serious about data modeling with WXS.
xFront has an item on global versus local element and type declarations in its excellent best practices section. While you are browsing the xFront, do have a look at what they have to say about web services as well, which is controversial and thought-provoking.