Beep BEEP!

October 16, 2002

This article is the last in a series examining how one might go about sending binary data as part of a SOAP message. This month we look at BEEP, the Blocks Extensible Exchange Protocol.

The primary inventor of BEEP is Marshall Rose, a long-time "protocol wonk" within the IETF community. Marshall has authored more than 60 RFC's, covering everything from core SNMP details to a DTD for IETF RFCs.

Like SOAP 1.2, BEEP is described in a transport-neutral manner and is defined in RFC 3080. The most common transport is TCP, and the TCP mapping is found in RFC 3081. If you are at all interested in network protocols, you should read RFC 3117, "On the Design of Application Protocols". At the BEEP site you can find HTML versions of BEEP and related RFCs generated from the DTD mentioned above.

BEEP actually addresses a wider range of problems than just SOAP and binary data. According to the RFC, BEEP is "a generic application protocol kernel for connection-oriented, asynchronous interactions." In other words, it's like an application-level transport protocol. Unlike many application protocols, it supports more than just lockstep request-response interchanges, which makes it suitable for applications such as event logging (see RFC 3195, for "Reliable Syslog"), and general peer-to-peer interactions.

One of the more interesting implications of BEEP's design principals is that it supports multiplexing -- that is, multiple channels communicating over a single transport stream. BEEP allows different applications, or multiple instances of the same application, to use the same stream for independent activities, including independent security mechanisms. For example, a single BEEP-over-TCP link between a browser and a web server would efficiently allow the browser to fetch a page over SSL-encrypted HTTP, while also streaming in multiple images over unencrypted HTTP.

A BEEP message is the complete set of data to be exchanged. BEEP defines three styles of message exchange, distinguished by the type of reply the server (or, more accurately, the message recipient) returns:

Message/reply, the server performs a task, and sends a positive reply back.
Message/error, the server does not perform the task, and sends a negative reply back.
Message/answer, the server sends back zero or more answers, followed by a termination indicator. This style is appropriate for streaming data, for example.

Messages are divided into frames. The specification says that a message is normally sent in a single frame. Frames consist of a header, a payload, and a trailer. The header and trailer are encoded in printable ASCII and terminated with a CRLF -- in other words, typical IETF protocol style. The payload is an arbitrary set of octets.

The frame headers are almost identical and are structured as follows,

type SP channel SP msgno SP more SP seqno SP size CR LF

where SP, CR, and LF stand for the ASCII space, carriage-return, and linefeed characters respectively.

The type specifies the message type and is a three-byte strings -- MSG (message), RPY (reply), ANS (answer), ERR (error), or NUL (terminator) -- which correspond to the message types listed in the exchange patterns above.

The channel identifies the multiplex channel of the communication and is the printed form of a number between zero and 2³¹ - 1. BEEP reserves channel zero for management tasks, like creating new channels. So the simplest BEEP application will need two channels and is,therefore, likely to end up being multiplex-ready.

The msgno uniquely identifies the message. It's a number in the same format as channel and acts like a session identifier: a reply to a given message will have the same msgno. A msgno cannot be reused until the final response packet -- RPY, ERR, or NUL -- has been received.

The more indicator is a period if this is the only or last frame of a message. It's an asterisk if at least one other frame follows for this message. Note that frames can have a zero-length payload, so that final EOF can be sent by itself.

The seqno is an unsigned 32-bit number (i.e., twice the size of the other number fields) and specifies the offset of the first octet in the current payload. This is different from the conventional use of the term "sequence number"; perhaps "offset" might have been a better choice. In most cases, the seqno of frame n will be the sum of the payload lengths of the prior n - 1 packets. Some applications, however, might want to efficiently use BEEP to omit large sets of default values. For example, a distributed file system protocol could avoid sending a large number of all-zero-byte frames.

If a server is sending multiple replies to the client -- the message/answer message pattern -- there will be an ansno field to identify each answer. The combination of ansno and more is similar to nested messages within DIME and the MB/MF/CF bits.

The final field is the size, which has the obvious meaning of specifying the number of bytes in the message payload. While BEEP is often used for XML or other textual data, payloads can be arbitrary. Payloads are MIME objects in that they may have MIME headers describing the type and encoding of the payload. BEEP adds two rules: the default Content-Type is octet-stream, and the default transfer encoding is binary. This seems like a reasonable trade-off; binary data has no overhead, and text data has a standard, but minor overhead to describe its type.

There are a set of simple constraints among the various fields that let implementations do a large number of error-checks. Over a dozen are documented in the RFC.

BEEP initial connection follows standard IETF practice: the client (or initiator) connects to a remote port, and the server (or listener) responds with a greeting message. The greeting message includes a profile element in which the server lists its supported facilities. For example, after receiving a connection, the following greeting indicates SOAP and TLS (SSL) support:

RPY 0 0 . 0 143

Content-Type: application/beep+xml



<greeting>

  <profile uri="http://iana.org/beep/TLS"/>

  <profile uri="http://iana.org/soap"/>

</greeting>

END

From the frame header, we see that this is a reply (to the initial TCP connection), that it is the first message on the management channel, and that it is completely contained in this frame. This is a simple greeting; optional elements include codeset localization and identification of management features.

The client now acknowledges the greeting message, immediately indicating that it wants to start a SOAP channel:

RPY 0 0 . 0 51

Content-Type: application/beep+xml



<greeting/>

END

MSG 0 1 . 0 138

Content-Type: application/beep+xml



<start number='1' serverName='www.example.com'>

  <profile uri='http://iana.org/beep/soap'/>

</start>

END

The number attribute identifies the channel, and the serverName attribute identifies the "virtual server" the client wishes to work with. A client can specify multiple profiles, and the server will determine which ones it can support and send them back in its reply:

RPY 0 1 . 221 79

Content-Type: application/beep+xml



<profile uri='http://iana.org/beep/soap'/>

END

Now both parties are using a single TCP connection for management and plain text SOAP communication. BEEP is a fairly simple protocol, allowing you to build powerful communication infrastructures with modest work.

The full SOAP-over-BEEP specification is RFC 3288. It says that, once the channel has been opened, both sides enter a "boot" phase. During this phase information equivalent to the classic SOAP-over-HTTP headers is sent -- that is, the URI that would be POST'd to. It's also a placeholder for future feature negotiation and has a trivial format:

<bootmsg resource='StockQuote'>

As an optimization, BEEP allows a boot message and its reply to be piggy-backed into the channel start message, avoiding extra network round-trips (and latency). In practice, then, here is what a real start message and its reply would look like:

MSG 0 1 . 0 197

Content-Type: application/beep+xml



<start number='1' serverName='www.example.com'>

  <profile uri='http://iana.org/beep/soap'>

  <![CDATA[<bootmsg resource='StockQuote'/>]]>

  </profile>

</start>

END



RPY 0 1 . 0 112

Content-Type: application/beep+xml



<profile uri='http://iana.org/beep/soap'>

<![CDATA[<bootrpy/>]]>

</profile>

END

More from Rich Salz

Having established communication, simple attachments become trivial: send the message as multi-part MIME, as specified by the SOAP with Attachments Note. This is, in fact, the only attachment scheme currently specified in RFC 3288, which is unfortunate. SOAP should be able to take full advantage of the powerful asynchronous multiplex communications core provided by BEEP. It would seem that only two things are needed: first, a URN scheme for naming BEEP channels; second, that a SOAP message could refer to data coming in over separate channels, streaming in as it becomes available. Now, that would be cool.

I'll close with the following plea: Don, pick BEEP. which I'll explain. Back in February, Don Box pointed out that HTTP has a number of limitations for use as a web services protocol. His criticisms are right-on. It's my sincere hope that we don't get a new protocol: BEEP seems to meet all his requirements.