Menu

Protocol Design: The Need for Speed

February 25, 2004

Itamar Shtull-Trauring

One of the common functions protocols provide is transfering data. It is quite common for multiple pieces of data to be requested; for example, a POP3 client requesting multiple emails. The way this is supported in the protocol can significantly affect the speed of data transfer, as well as the ease of protocol implementation. As always, protocols should be designed to be as simple as possible (and no simpler) and thus easy to implement. But what makes a fast protocol?

When dealing with network speed, there are two main measures: bandwidth and latency. Bandwidth defines how much data (typically some multiple of bytes) can be sent or received per second. A DSL line typically has a download bandwidth of around 100 kilobytes/sec and an upload bandwidth of 20 kilobytes/sec. Latency is the time it takes a message to reach another host, usually on the order of 30-100 millisecond for DSL in the USA. With latency, unlike bandwidth, the lower the latency, the better. Satellite-based internet connections will have high bandwidth, but also a high latency; it takes longer to bounce signals off a satellite than to send them over wires.

Optimizing protocols for bandwidth is not very hard: minimize the amount of overhead (e.g. use the command "GET" rather than the command "PLEASEDOWNLOADTHEFILE"). In any case, in bandwidth-limited tasks such as transfering large files, the data itself is mostly what takes up the bulk of the time rather than the protocol's overhead for messages. Minimizing latency is the name of the game.

Also in this Series

How Many Bytes?

Sessions

The simple strategy for downloading multiple pieces of data is as follows (used, for example, by the original POP3 protocol specification):

  1. Send request for data.
  2. Wait until all the data has been received.
  3. Repeat process with next chunk of data.

The transcript for POP3 would look something like this:


C:  RETR 1

S:  +OK 120 octets

S:  <the POP3 server sends message 1>

S:  .

C:  RETR 2

S:  +OK 112 octets

S:  <the POP3 server sends message 2>

S:  .

...

At this point a little arithmetic would be helpful. Assume the time it takes a message to arrive is one second, and the bandwidth available is ten kilobytes/second, and the client is downloading five messages of ten kilobytes each. Each email fetched will take one second for the download command to arrive, one second for the email to be downloaded (10 Kbyte at 10 KByte/sec), for a total of (1 + (10 / 10)) * 5 = 10 seconds. Downloading a single fifty kilobyte message, however, the total time would have been (1 + (50 / 10)) = 6 seconds. Why does downloading the same amount of data take longer if it is split up into multiple messages? Latency. The time it takes for each new download command to reach the server is slowing the client down.

The solution to the problem is known as pipelining. All the requests are sent in one go at the beginning, so that the latency hit only happens once. Then the responses are sent back one by one. They can be matched up with the requests since they are sent back in the same order the requests were sent. Pipelining requires the server to be able to handle incoming commands while it is still handling a previous command. POP3 supports pipelining via a protocol extension, with the transcript looking something like the following (note that the second command from the client may arrive while the server is sending the response to the first command or as it's starting):


C:  RETR 1

C:  RETR 2

C:  RETR 3

S:  +OK 120 octets

S:  <the POP3 server sends message 1>

S:  .

S:  +OK 112 octets

S:  <the POP3 server sends message 2>

S:  .

S:  +OK 201 octets

S:  <the POP3 server sends message 3>

S:  .

Pipelining is useful in more cases than just transfering data. It can be used to speed up any protocol based on request/response pairs. In fact, the speed up will be even more dramatic in cases where the amount of returned data is small and latency becomes the main bottleneck.

It's known that, for some protocols, handling different requests will take different amounts of time for the server to process. For example, it takes a HTTP server longer to run a CGI script than to read a static file from disk. Recognizing that the processing time can also be a factor in the speed of the protocol, the solution is some form of parallelism.

One way to add some amount of parallelism to a protocol is by weakening the constraints on the order of results. Assume request #1 takes two seconds to process and request #2 can be processed immediately, and that sending either response to the client takes two seconds. If the responses are sent in order, the time it takes to send responses is 2 + 2 + 2 = 6 seconds. If response #2 is sent before response #1, the time is 2 + 2 = 4 seconds, as response #1 is being sent in parallel to response #2 being processed.

This obviously only works if changing the order of processing won't affect the result in any important way (reading item 1 and then deleting item 1 won't work if the order is reversed). Slow processing can run in parallel to sending other responses over the network, and in some cases processing of multiple requests can also be done in parallel. The problem of matching up responses to requests is solved by message identifiers. Each request has a unique identifier, and the corresponding response also has the same identifier. Thus, they can be matched up in whatever order they are sent. IMAP supports this method of parallelism:


C:  a002 CREATE important/

C:  a003 NOOP

S:  a003 OK NOOP completed

S:  a002 OK CREATE completed

The problem with this limited form of parallelism is that long responses will block the connection. Any other responses will have to wait until the large amount of data for that response has been completely sent. Certain responses might have much higher latency than is wanted. In a protocol that supports both file downloads and chat, it is not reasonable to expect the highly interactive and latency-sensitive chat functionality to freeze for minutes at a time while a file is downloaded. In these sort of protocols a stronger parallelism is required, which is to say, parallelism in the use of the network connection. One approach is to limit the size of the messages that can be sent across the connection. A large file would have to be broken into multiple messages, and other messages could be sent in between. Some method of preventing a single series of messages from monopolizing the connection is required.

There's another possible solution, namely, using multiple TCP connections to support this sort of parallelism, which is used by FTP). But it's usually not a good idea, as it will often require some way of sharing the session, authentication credentials, and so on across multiple connections. In addition, opening a new TCP connections takes time, again, the problem of latency. Instead, a similar solution can be implemented: multiple channels or streams running across a single TCP connection. To the users of such a stream, the connection is quite similar to a TCP connection, but because all the streams run across a single TCP connection they can easily share authentication credentials and so on.

For example, the SSH protocol allows users to forward data from a port on the remote server to a local port on the client machine (or vice versa). Multiple such streams can run in parallel using this technique, which is called "multiplexing". All the streams running across the same TCP connection share the same permissions on the server.

Multiplexing has its own problems. In order to implement it a whole set of issues need to be dealt with: starting streams, ending streams, prevention of a single stream from monopolizing the connection, and so on. When using TCP these are handled automatically, but with a custom protocol they need to be implemented from scratch, though of course with less effort as TCP already provides some of the necessary services (e.g. reliable data transfer). The BEEP framework exists to minimize these problems; it provides the infrastructure (protocol definitions and reference implementations) for implementing multiplexing protocols.

The design of a protocol can strongly affect its performance. In virtually all cases the main issue protocols need to deal with is latency. The goal of any solution is to ensure the network connection is not being under-utilized by waiting for client messages, slow commands to process, and so on. Latency can also be an issue when dealing with parallel commands, and here the solution involves preventing any one message or operation from monopolizing the connection. The speed (perceived or real) of a protocol can strongly affect its acceptance. A good protocol will squeeze all it can from the physical constraints of bandwidth and latency.