SOAP Encodings, WSDL, and XML Schema Types

February 20, 2002

Using a web service involves a sender and a receiver exchanging at least one XML message. The format of that message must be defined so that the sender can construct it and the receiver can process it. The format of a message includes the overall structure of the tree, the local name and namespace name of the elements and attributes used in the tree, and the types of those elements and attributes.

The name and types of the element and attributes contained in the message can be defined in a schema. The Web Services Description Language (WSDL) can use a schema in this way. And if a WSDL description of the web service is the start point, then the message format is known before a line of code is written. However, in many cases, the code that is to be exposed as a web service already exists. In other cases, developers are reluctant to start with WSDL, preferring to start with some programmatic data structure. Even in these cases, some description of the web service is needed in order for clients to correctly construct request messages and destructure responses. Ideally that description would still be WSDL, otherwise clients will have to learn to read and understand multiple description languages.

So in cases where a schema and associated WSDL are not the starting point, how is the WSDL to be generated and what format do the XML messages have? Many of the SOAP implementations that exist today will happily take a programmatic data type, typically a class definition of some sort, and serialize that type into XML. But in the absence of a schema, how do these implementations decide whether to use elements or attributes? How do they decide what names to give to those constructs and what the overall structure of the tree should be? The answer can be found in the SOAP Encoding section of Part 2 of the SOAP 1.2 specification.

SOAP Encoding

The SOAP encoding defines a set of rules for mapping programmatic types to XML. This includes rules for mapping compound data structures, array types, and reference types. With respect to compound data structures, the approach taken is reasonably straightforward; all data is serialized as elements, and the name of any given element matches the name of the data field in the programmatic type. For example, given the following Java class,


class Person

{

   String name;

   float age;

}

the name and age fields would be serialized using elements whose local names where name and age respectively. Both elements would be unqualified, that is, their namespace name would be empty. In cases where the name of a field would not be a legal XML name Appendix A of the spec provides a mapping algorithm.

The mapping of reference types is more complicated. It involves serializing the instance and marking it with an unqualified attribute whose local name is id. All other references to that instance are then serialized as empty elements with an unqualified attribute whose local name is href. The value of the href is a URI that references the relevant serialized instance via its id attribute. This mechanism provides a way to serialize graphs, including cyclic graphs in XML.

Mapping Types

The SOAP Encoding also provides mappings from programmatic data types to the data types found in XML Schema Part 2: Datatypes. Thus given a programmatic data structure, the name and type of each element in the serialized XML can be determined. People have observed that the SOAP Encoding rules are as much about mapping between type systems as they are about mapping between instance formats.

Given a web service that accepts Person data structures as input, perhaps to add them to some list of people that it maintains, a SOAP message to that web service might look like this:


<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope"

>

  <soap:Body>

    <pre:Add xmlns:pre="http://example.org/lists"

             

soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding" >

      <person>

        <name>Hayley</name>

        <age>30</age>

      </person>

    </pre:Add>

  </soap:Body>

</soap:Envelope>

The value of the encodingStyle attribute states that the SOAP Encoding rules were followed when serializing the data. This enables the deserializer at the other end of the pipe to deserialize the message correctly. Other encoding styles can be used with SOAP in which case the encodingStyle attribute would have a different URI value.

Type casting

More from Rich Salz

It is worth noting that the foregoing message carries enough information for the receiver to figure out the type of all the elements. This is because the type is tied to the element name. Both sender and receiver know what the names of the elements are. And they also know the names and types of the fields in the programmatic data type. Given that the element name is so closely tied to the field name, the type of the element can also be determined.

There are some cases where exact type information may not be known until runtime. One is the case of a web service which accepts data in a similar fashion to the COM VARIANT, CORBA any, or Java Object. Such a service specifies nothing about the type of the data at design time. Rather type information must be provided at runtime. Such services in reality do not accept absolutely any type but work on a reasonably small subset of types, generating errors when unknown types are encountered.

Another case is where further classes are derived from the Person class, for example, RacingDriver and FootballPlayer. Assuming the web service understands these classes, they could be submitted in request messages.

In both cases, the "totally" polymorphic element and the more specific case of explicitly derived types, the element name is no longer enough to fully identify the type of the element. Something more is needed.

The SOAP Encoding rules allow the use of the type attribute from the http://www.w3.org/2001/XMLSchema-instance namespace to be used to specify that a particular type is being passed at runtime. The person element in the message would then appear look like


<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

           xsi:type="RacingDriver" >

  <name>Martin</name>

  <age>34</age>

  <bestplace>6th</bestplace>

</person>

It is worth noting that the xsi:type attribute is of type QName. Thus, strictly speaking, the above example refers to an unqualified RacingDriver type. In reality a namespace should probably be assigned to types to avoid name clashes. Also, xsi:type is only needed when the exact type is not known until runtime. In cases where both sides know the types in advance, the most common case, xsi:type, is redundant.

Conclusions

Whenever messages are sent some type information is known in advance. In some cases all types are completely known and further information beyond the element names is not needed. In other cases, more specific type information may be communicated at runtime. In such cases, the xsi:type attribute is used and the types really need to be assigned to namespaces.

It would seem that whenever and however we define message formats for a given web service exchange we are really defining a schema for those messages. Thus the SOAP encoding is really about mapping from programmatic type systems to an XML type system, that of XML Schema. Some aspects of that mapping work very well; other aspects, such as references, do not map particularly well, due to the tree nature of XML. Given that the serialization format is XML, and XML is a tree, serious thought should be given to whether more esoteric programmatic constructs such as references need to be directly modeled in SOAP. If such constructs really are needed, an XML Schema friendly approach should be taken.