Beep BEEP!
This article is the last in a series examining how one might go about sending binary data as part of a SOAP message. This month we look at BEEP, the Blocks Extensible Exchange Protocol.
The primary inventor of BEEP is Marshall Rose, a long-time "protocol wonk" within the IETF community. Marshall has authored more than 60 RFC's, covering everything from core SNMP details to a DTD for IETF RFCs.
Like SOAP 1.2, BEEP is described in a transport-neutral manner and is defined in RFC 3080. The most common transport is TCP, and the TCP mapping is found in RFC 3081. If you are at all interested in network protocols, you should read RFC 3117, "On the Design of Application Protocols". At the BEEP site you can find HTML versions of BEEP and related RFCs generated from the DTD mentioned above.
BEEP actually addresses a wider range of problems than just SOAP and binary data. According to the RFC, BEEP is "a generic application protocol kernel for connection-oriented, asynchronous interactions." In other words, it's like an application-level transport protocol. Unlike many application protocols, it supports more than just lockstep request-response interchanges, which makes it suitable for applications such as event logging (see RFC 3195, for "Reliable Syslog"), and general peer-to-peer interactions.
One of the more interesting implications of BEEP's design principals is that it supports multiplexing -- that is, multiple channels communicating over a single transport stream. BEEP allows different applications, or multiple instances of the same application, to use the same stream for independent activities, including independent security mechanisms. For example, a single BEEP-over-TCP link between a browser and a web server would efficiently allow the browser to fetch a page over SSL-encrypted HTTP, while also streaming in multiple images over unencrypted HTTP.
A BEEP message is the complete set of data to be exchanged. BEEP defines three styles of message exchange, distinguished by the type of reply the server (or, more accurately, the message recipient) returns:
Messages are divided into frames. The specification says that a message is normally sent in a single frame. Frames consist of a header, a payload, and a trailer. The header and trailer are encoded in printable ASCII and terminated with a CRLF -- in other words, typical IETF protocol style. The payload is an arbitrary set of octets.
The frame headers are almost identical and are structured as follows,
type SP channel SP msgno SP more SP seqno SP size CR LF
where SP, CR, and LF stand
for the ASCII space, carriage-return, and linefeed characters
respectively.
The type specifies the message type and is a
three-byte strings -- MSG (message), RPY
(reply), ANS (answer), ERR (error), or
NUL (terminator) -- which correspond to the message types
listed in the exchange patterns above.
The channel identifies the multiplex channel of the
communication and is the printed form of a number between zero and
231 - 1. BEEP reserves channel zero for management tasks,
like creating new channels. So the simplest BEEP application will need
two channels and is,therefore, likely to end up being
multiplex-ready.
The msgno uniquely identifies the message. It's a
number in the same format as channel and acts like a
session identifier: a reply to a given message will have the same
msgno. A msgno cannot be reused until the
final response packet -- RPY, ERR, or
NUL -- has been received.
The more indicator is a period if this is the only or
last frame of a message. It's an asterisk if at least one other frame
follows for this message. Note that frames can have a zero-length
payload, so that final EOF can be sent by itself.
The seqno is an unsigned 32-bit number (i.e., twice
the size of the other number fields) and specifies the offset of the
first octet in the current payload. This is different from the
conventional use of the term "sequence number"; perhaps "offset" might
have been a better choice. In most cases, the seqno of
frame n will be the sum of the payload lengths of the prior
n - 1 packets. Some applications, however, might want to
efficiently use BEEP to omit large sets of default values. For
example, a distributed file system protocol could avoid sending a
large number of all-zero-byte frames.
If a server is sending multiple replies to the client -- the
message/answer message pattern -- there will be an ansno
field to identify each answer. The combination of ansno
and more is similar to nested messages within DIME and
the MB/MF/CF bits.
The final field is the size, which has the obvious
meaning of specifying the number of bytes in the message payload.
While BEEP is often used for XML or other textual data, payloads can
be arbitrary. Payloads are MIME objects in that they may have MIME
headers describing the type and encoding of the payload. BEEP adds
two rules: the default Content-Type is octet-stream, and the default
transfer encoding is binary. This seems like a reasonable trade-off;
binary data has no overhead, and text data has a standard, but minor
overhead to describe its type.
There are a set of simple constraints among the various fields that let implementations do a large number of error-checks. Over a dozen are documented in the RFC.
BEEP initial connection follows standard IETF practice: the client (or initiator) connects to a remote port, and the server (or listener) responds with a greeting message. The greeting message includes a profile element in which the server lists its supported facilities. For example, after receiving a connection, the following greeting indicates SOAP and TLS (SSL) support:
RPY 0 0 . 0 143
Content-Type: application/beep+xml
<greeting>
<profile uri="http://iana.org/beep/TLS"/>
<profile uri="http://iana.org/soap"/>
</greeting>
END
From the frame header, we see that this is a reply (to the initial TCP connection), that it is the first message on the management channel, and that it is completely contained in this frame. This is a simple greeting; optional elements include codeset localization and identification of management features.
The client now acknowledges the greeting message, immediately indicating that it wants to start a SOAP channel:
RPY 0 0 . 0 51
Content-Type: application/beep+xml
<greeting/>
END
MSG 0 1 . 0 138
Content-Type: application/beep+xml
<start number='1' serverName='www.example.com'>
<profile uri='http://iana.org/beep/soap'/>
</start>
END
The number attribute identifies the channel, and the
serverName attribute identifies the "virtual server" the
client wishes to work with. A client can specify multiple profiles,
and the server will determine which ones it can support and send them
back in its reply:
RPY 0 1 . 221 79
Content-Type: application/beep+xml
<profile uri='http://iana.org/beep/soap'/>
END
Now both parties are using a single TCP connection for management and plain text SOAP communication. BEEP is a fairly simple protocol, allowing you to build powerful communication infrastructures with modest work.
The full SOAP-over-BEEP specification is RFC 3288. It says that, once the channel has been opened, both sides enter a "boot" phase. During this phase information equivalent to the classic SOAP-over-HTTP headers is sent -- that is, the URI that would be POST'd to. It's also a placeholder for future feature negotiation and has a trivial format:
<bootmsg resource='StockQuote'>
As an optimization, BEEP allows a boot message and its reply to be piggy-backed into the channel start message, avoiding extra network round-trips (and latency). In practice, then, here is what a real start message and its reply would look like:
MSG 0 1 . 0 197
Content-Type: application/beep+xml
<start number='1' serverName='www.example.com'>
<profile uri='http://iana.org/beep/soap'>
<![CDATA[<bootmsg resource='StockQuote'/>]]>
</profile>
</start>
END
RPY 0 1 . 0 112
Content-Type: application/beep+xml
<profile uri='http://iana.org/beep/soap'>
<![CDATA[<bootrpy/>]]>
</profile>
END
More from Rich Salz |
Having established communication, simple attachments become trivial: send the message as multi-part MIME, as specified by the SOAP with Attachments Note. This is, in fact, the only attachment scheme currently specified in RFC 3288, which is unfortunate. SOAP should be able to take full advantage of the powerful asynchronous multiplex communications core provided by BEEP. It would seem that only two things are needed: first, a URN scheme for naming BEEP channels; second, that a SOAP message could refer to data coming in over separate channels, streaming in as it becomes available. Now, that would be cool.
I'll close with the following plea: Don, pick BEEP. which I'll explain. Back in February, Don Box pointed out that HTTP has a number of limitations for use as a web services protocol. His criticisms are right-on. It's my sincere hope that we don't get a new protocol: BEEP seems to meet all his requirements.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.