Menu

A Brief History of SOAP

April 4, 2001

Don Box

It's been a little more than three years since I first started working in XML in general and SOAP in particular. For the past year or so, my own SOAP work has been pretty minimal, mainly because without a stable XML Schema specification, the thought of building tons of SOAP support plumbing seems pretty futile. Now that the XML Schema WG has more or less completed its work, it's time to get back to work (for me at least).

My first "official" act in this next phase of SOAP's development is to take a few minutes to retrace the steps that got us here. Hence this article.

In the Beginning: SOAP 98

When SOAP started in early 1998, there was no schema language or type system for XML (in fact, XML 1.0 had just become a full Recommendation that quarter). If you look at earlier versions of the SOAP spec (including XML-RPC, which was subsetted from the 1998 SOAP spec), most of the focus was on defining a type system. The original type system of SOAP (and XML-RPC) had a handful of primitive types, composites that are accessed by name (a.k.a. structs) and composites accessed by position (a.k.a. arrays). Once we had these representational types in place, we modeled behavioral types by defining operations/methods in terms of pairs of structs and, at least on the DevelopMentor and Microsoft sides, aggregated these operations into interfaces. Hence the RPC flavor that people associate with SOAP.

Was SOAP the first attempt to add a behavioral type system to XML? Not at all. I recall scanning the landscape at the time. The existing proposals either assumed a COM type system underneath (unacceptable, since even back in 1998 we knew COM wasn't the ultimate type system) or were very EDI-like, which would alienate parts of the development community. For that reason, we looked at the existing serialization formats (ASN.1 BER, NDR, XDR, CDR, JRMP) and RPC protocols (GIOP/IIOP, DCE/DCOM, RMI, ONC) and tried to hit the sweet spot that would satisfy the 80% case elegantly but could be bent to adapt to the remaining 20% case.

So why didn't we ship SOAP back in 1998? That one's easy: Microsoft politics.

The original contributors to SOAP within MS worked on the COM/MTS team. At the same time, the XML group within MS was working on XML-Data, which became one of the many seeds for the XML Schema language we know today. As is often the case in large companies, the two groups within MS didn't see eye to eye, so public support for SOAP got shelved within MS for some period of time. (As a side note, I was one of those people who didn't get XML-Data when I first encountered it, and I have publicly apologized to Andrew Layman at least twice for being so dense.)

Unwilling to let the slow process of getting MS to act on SOAP beyond a press release, Dave Winer went out on his own and shipped the XML-RPC specification based on subsetting the original SOAP type system. I spent the rest of the year working on Java metadata grunge, including among other things a projection of Java class files onto XML.

SOAP Phase 2: 1999-2000

By the time a SOAP specification finally shipped using the name "SOAP" (4Q1999), the W3C XML Schema language was by no means done, but it certainly had progressed to a point where it became obvious to most of the SOAP authors that we needed to leverage and integrate the work of the Schema Working Group as much as possible. Their primitive types were a superset of what we needed for SOAP. Their composite type system was mostly a superset of what we needed for SOAP. It would have been folly to ignore their work.

Ideally SOAP would have taken the representational type system of XML Schemas verbatim and simply added the notion of behavioral types and operations/methods. Unfortunately, XML Schemas lacked (and still lacks) support for synthetic types such as typed references and arrays. While you can define things that look like typed references and arrays in the schema language, these constructs are not really native to XML Schemas. Worse, you would need to predefine these reference and array types, which makes it really difficult to isomorphically move back and forth between say a Java class and an XML Schema complex type. For that reason, SOAP needed to augment the type system with the soap:reference and soap:Array types. It is interesting to note that the Schemas Working Group tried to tackle the typed reference issue; but, unfortunately, it couldn't converge on a solution that would support typed references as they appear in most programmatic type systems.

Most of what the 4Q1999 SOAP specifications did was simply illustrate how to model typed references and arrays in the W3C XML Schema type system. Period. We also had a model for adding optional and mandatory protocol headers (like CORBA's service contexts and DCOM's ORPCTHIS/THAT), but that was it. Frankly, had the schema specification been a full REC in 4Q1999, the SOAP specification would have at most 3-4 pages. However, the XML Schema specification was changing radically with each successive Working Draft, so those of us working on SOAP had to deliberately insulate ourselves from the churn that was W3C XML Schema during 1999 and 2000 in order to make any progress whatsoever.

To me, the biggest technical issue that faced SOAP in 1999 and 2000 was the lack of metadata. DevelopMentor tried to introduce a simple metadata format ( CDL) that was isomorphic with the XML Schema type system, yet didn't tie us to the rather fluid schema language. Dave Winer totally balked at the idea of metadata , indicating that human-readable descriptions were all that was needed. Certainly folks like Eric Raymond seem to agree with him. The reason we abandoned CDL, however, was a discussion with Gopal Kakivaya of Microsoft, who convinced us that what we needed could be achieved by annotating XML Schemas with additional SOAP-specific hints that were allowed (and in fact anticipated) by the Schema specification. At this point, DevelopMentor joined the Schemas WG and most of our effort internally moved towards XML Schema support. We use XML Schema a lot around DM, having shipped an XSD compiler for C++ that we use internally.

The biggest non-technical issue that faced SOAP in 1999 and 2000 was the hideous nature of vendor wars. The FUD that flew around the trade press and vendor web sites was downright embarrassing. I recently ran across a Sun Reality Check that made me ill. In particular, the following quote blew me away:

SOAP has changed a lot. It started to become interesting to us when IBM made additions to the mediocre specification that Microsoft initially championed (you're right, we thought that specification was a bad idea).

If SOAP/1.0 (the last pre-IBM version) was a bad idea, then so was SOAP/1.1 (the first post-IBM version, which was submitted to the W3C). There were no major improvements to SOAP from 1.0 to 1.1. The specification was reorganized to make the modular design of SOAP more apparent. However, the few minor technical changes we made were arguably a step backward (in fact, I believe to date there are no SOAP implementations that do anything meaningful with the one new feature, SOAPActor).

The Post-SOAP Era: 2001 and beyond

So where are we now? Tough question. Here are some observations about the current state of play.

The XML Schema specification is stable and now a Proposed Recommendation.
To me, this is the most important advance for people who care about XML protocols and messaging in general and SOAP in particular. The fact that no major changes can be made before advancing to a full W3C recommendation means that the industry at large knows what they are dealing with when it comes to applying types to XML. I have stated before, and I still stand by my belief, that without XML Schemas, XML is a balkanized standard and its utility for software, component, or service integration is fairly minor. The Schema specification does most of the heavy lifting for SOAP, and it kills me that we can't do a SOAP/1.2 to address the new schema language. Which brings me to my next observation.

The W3C now has a XML Protocol Working Group.

SOAP is now where it belongs. Until we got W3C buy-in, vendors were skittish given the nature of the industry. Now that SOAP has been subsumed into the XML Protocol work, the big vendors have (for the most part) stopped arguing about SOAP and we have a fairly open process for beating the protocol into shape. In my opinion, one of the smartest things the WG did was to immediately define their relationship to the XML Schema type system.

We are somewhat closer to having a standardized metadata format for SOAP.
While far from perfect, WSDL is as close as we've ever come to having a workable metadata standard that more than three people can agree on for longer than a week at a time. Is WSDL perfect? Not by a long shot. Is it workable? For the most part, yes. Does SOAP/XML Messaging make sense without something like WSDL? No way. My own criticisms of WSDL relate to WSDL's current form having a somewhat schizophrenic relationship to XML Schema. (In fact, there are several ways in which WSDL and XML Schema are completely incompatible.) Despite my criticisms, portions of WSDL are more than workable, albeit overly verbose and indirect, for every SOAP scenario or application I have dealt with in the past 3 years. Hopefully the XML Protocol Activity will focus on finishing the WSDL specification and give the world at a reasonable way of describing, validating, and automating XML-based services.

For the most part, people have stopped arguing about SOAP

SOAP is what most people would consider a moderate success. The ideas of SOAP have been embraced by pretty much everyone at this point. The vendors are starting to support SOAP to one degree or another. There are even (unconfirmed) reports of interoperable implementations, but frankly, without interoperable metadata, I am not convinced wire-level interoperability is all that important. It looks like almost everyone will support WSDL until the W3C comes down with something better, so perhaps by the end of 3Q2001 we'll start to see really meaningful interoperability.

Epilogue

SOAP's original intent was fairly modest: to codify how to send transient XML documents to trigger operations or responses on remote hosts. Because of our timing, we were forced to tackle issues that the Schemas WG has since solved, which caused the "S" in SOAP to be somewhat lost.

At this point in time, I firmly believe that only two things are needed for mid- to long-term convergence:

  1. The XML Schemas WG should address the issue of typed references and arrays. Adding support for these two synthetic types would obviate the need for SOAP section 5. These constructs are broadly useful outside the scope of messaging and RPC applications, so it makes sense that the Schemas WG should address this.
  2. Define the handful of additional constructs needed to tie the representational types from XML Schemas into operations and WSDL-style portTypes.

WSDL comes close enough to providing the necessary behavioral constructs to XML Schemas, and I am cautiously optimistic that something close to WSDL could subsume SOAP entirely. I strongly encourage you to study the WSDL specification and submit comments, improvements, and errata so we can get convergence and interoperability in our lifetime.