Distributed XML
by Edd Dumbill
|
Pages: 1, 2
Characterizing SOAP
If RDF is the Prolog of XML, then SOAP is its Java. While RDF's heritage is the declarative disciplines of knowledge representation and logic programming, SOAP's heritage is in imperative, "conventional", object-oriented culture.
At its most basic level, SOAP is a set of rules for representing data in XML. Given a data structure, SOAP prescribes an agreed-upon serialization of it. That may sound incredibly similar to the basic explanation of RDF that I gave: indeed for any given graph, the RDF and SOAP representations are more or less identical. (Henrik Frystyk Nielsen gave a presentation on this at the 9th International World Wide Web conference in Amsterdam earlier this year.)
Beyond serialization of data, SOAP was created with messaging as its target application, providing an over-the-wire representation for messages. In contrast to RDF documents, which tend just to "sit there" until a processing application comes along, SOAP documents are actively passed in between computers (the destinations known as "endpoints"). SOAP further provides a mapping for those message exchanges to implement a remote procedure call mechanism.
Because of the defined semantics of what happens when a computer receives a SOAP message, SOAP servers will also publish contracts about what they will and will not accept.
Essential SOAP
SOAP is a protocol for serializing data and wrapping it in an envelope so it can be transported between endpoints. Like RDF it attempts to bring some order to things that could be done in many ways. The scenario it addresses is how to perform machine-to-machine communication using XML?
This scenario breaks down into two sub-problems:
Encoding: each machine must use the same way of representing data types and wrapping up the message
Protocol: each machine must use the same rules of choreography for message exchange
Let's look at a simple example of a SOAP message:
Encoding example
<x:PurchaseOrder>
<x:CustomerName>Henry Ford</x:CustomerName>
<x:ShipTo>
<a:Street>5th Ave</a:Street>
<a:City>New York</a:City>
<a:State>NY</a:State>
<a:Zip>10010</a:Zip>
</x:ShipTo>
<x:PurchaseLineItems>
<x:Order>
<x:Product>Apple</x:Product>
<x:Price>1.56</x:Price>
</x:Order>
<x:Order>
<x:Product>Peach</x:Product>
<x:Price>1.48</x:Price>
</x:Order>
</x:PurchaseLineItems>
</x:PurchaseOrder>
(The namespace prefixes x and a are assumed to be
bound to some meaningful namespace URI. Note the use of namespaces
here, as in RDF they allow global specification of the semantics of a
particular element).
SOAP Processing Models
SOAP's typical deployment scenarios are different than RDF's. SOAP is most definitely in the "enterprise" buzzword camp, and being pushed in relation to e-business services. Aside from carrying business messages between servers, one aspect of SOAP receiving much attention is its ability to perform RPC-over-HTTP. This feature has received frowns from many departments, especially those conscious of network security.
The worries center around the naïve deployment of a SOAP server: default bindings might expose all manner of internal services to the outside world via port 80. The incentive for the developer is that deploying SOAP services is a lot simpler than CORBA or DCOM, along with none of that pesky wrangling with the network administrator. It has been pointed out, however, that firewall technology will simply get enhanced to sniff the contents of normal HTTP traffic in order to be assured that only allowed SOAP requests are passed through. This still doesn't solve the case of SOAP endpoints exposed over SSLthere's a whole can of worms that SOAP opens, which security experts are still peering into.
RPC with SOAP, maligned or otherwise, fits in well with the changing styles of programming accompanying the increase in web applications. Use of interpreted scripting languages (like Perl, Javascript, Python) is on the rise, with a tendency towards a number of small programs with well-defined responsibilities. The platform independence and lack of prerequisite machinery makes SOAP an attractive option for interprogram communication in this scenario. However, one shouldn't underestimate what is needed for a fully-fledged distributed object system, such as CORBA, and these complications will surely come.
Comparing SOAP's processing model to RDF's, we can see that with SOAP it's the documents that do the walking, and the computation is distributed over multiple computers. Functionality, rather than data, is what gets aggregated in the SOAP model. With that comes some hard problems to solve, too, mainly in the areas of latency and reliability, which is what leads me to suspect that SOAP will find its immediate home most comfortably in predictable network situations.
SOAP Infrastructure Requirements
Like RDF, SOAP also depends on known vocabularies in order to communicate with predictable semantics. Depending on the scenarios of use, vocabularies used with SOAP can have the following scopes:
private: in a pre-arranged one-off situation, a custom vocabulary can be agreed among the parties involved
industry-wide: vocabularies that are specific to particular industries, often maintained by an industry standards body
global: reusable vocabularies that cut across all industries and spheres of use
Because SOAP documents are intended to travel, they require additional infrastructure, including servers which route messages around and deliver them to the correct software components. The facilities offered by these software components also need a description language (of which there are currently two competing specifications, one each from IBM and Microsoft), and also a means of discovery for these software interfaces.
Incidentally, interface description and discovery are great applications for RDF technology.
SOAP/RDF Contrasts Revisited
Now we've seen the capabilities of both SOAP and RDF, we'll compare them once again, and see how they complement each other.
At the serialization level, SOAP and RDF are practically identical. They both deal with XML-encoded data.
In fact, one useful application of SOAP would be to carry RDF descriptions around in between RDF databases, supporting the RDF aggregation process
SOAP and RDF start to diverge when you look at their components outside of basic serialization:
The use of URIs: RDF insists on URIs for all names. SOAP can use namespaces when needed. As a consequence, RDF has only one scope, the global one, whereas SOAP documents may live in different scopes, relying on the context of operation for the semantic interpretation of names.
RPC: the ability to be used for RPC is a unique facet of SOAP. Although "SOAPists" now downplay this, there's no doubt that RPC-over-HTTP is a major attraction of the technology, and it is one with practical uses.
Marketing: the two technologies are marketed very differently and aimed at different spheres, although they are not worlds apart either in philosophy or in terms of the people who worked on creating both technologies.
Their Place in the Future
In sum, SOAP provides a web-aware alternative to current object protocols like CORBA. It has a low cost of deployment and is supported by software right now. It still has issues to face in terms of interoperability, security, and description/discovery infrastructure.
RDF implements a computer-readable alternative to current web knowledge representation applications (i.e., HTML). It faces some immediate challenges in terms of intelligibility and immediate business uses are less than certain. In the long run, though, it presents the opportunity to transform the way the web is used.
Looking at the big picture, one can envisage SOAP and RDF operating in a complementary manner in the Web of the future. RDF-based technology can provide directory information to describe and locate SOAP services. SOAP could carry RDF graphs in between RDF aggregation services, or provide a "virtual graph" service from a provider like Amazon.com.
Both SOAP and RDF have a part to play in my dream of a totally integrated future. However, they also point to the need for some very significant work, only just getting started, on agreeing upon XML vocabularies and semantics. That is a hard problem, one which I expect will never be totally solved, and may cause us to develop the best "nearly-there" solutions we can, to continue getting the most out of the Web.