Protocol Design: Structure and Syntax

April 21, 2004

Itamar Shtull-Trauring

Protocols use syntax, the way the data they send is formatted and organized, to send and receive structured information. A POP3 server knows that the bytes in the message RETR 1 followed by a CRLF should be parsed by splitting, based on the space character. The first part indicates a command and the second indicates an argument to the command, in this case, an integer index written using decimal representation, formatted using the ASCII characters for numerals (the bytes ONE would be meaningless to the server). Thus, a sequence of bytes formatted according to a known syntax correspond to a structured message: a command with an integer argument.

What design goals should be kept in mind when choosing the syntax for a protocol? The most important goals should be simplicity and consistency. By making the syntax easy to parse and generate, the protocol implementation will be shorter and simpler. This will minimize the occurrence of bugs, which in turn can make the implementation less vulnerable to attackers (many, if not most, vulnerabilities in network applications are in the parsing code). In cases where multiple implementations are expected or encouraged, an easy-to-implement protocol will be more likely to be adopted. More importantly, it will be less likely to have interoperability problems between different implementations.

Simplicity should not, of course, limit the functionality of the protocol. The second goal when choosing a protocol's syntax is extendibility -- the ability to accommodate future changes and additional functionality.

Another much-touted goal is that of being human-readable, sometimes described as being a "text protocol" rather than a "binary protocol." Unfortunately, these definitions are vague, and to some degree, meaningless. All protocols are ultimately sequences of bytes, which is to say, numbers. Some protocols will choose to use bytes that happen to match up to the way computers encode English text (i.e. use bytes that match up to the alphanumeric bytes in ASCII), and to choose a syntax easily understandable visually by a human. Even so, some amount of post-processing is being done to make the bytes understandable. A more meaningful and reasonable goal might be to allow the protocol to be easily "parsed" and generated by people (for debugging and testing purposes) using minimal software support.

With these goals in mind, two general approaches to syntax design can be considered. One way of representing structured data in a protocol is to create new syntax for each piece of structured information that needs to be represented. For example, in the following excerpt from a SIP (Session Initiation Protocol, used in VoIP) message, the first line is a command in one syntax, with the second argument of the command using the URI syntax, and the rest of the lines using a different syntax indicating a key-value pair. Each of these headers then has its own syntax for the value it needs to represent. The Via header, for example, records the address of the client that sent the message. The From and To headers use a different syntax for a different form of address.


Via: SIP/2.0/UDP

From: <>

To: <>

Contact: "John Smith" <sip:smith@>



There are a number of issues with this approach that make its use problematic. Each new piece of structured information needs new standards to be defined, and new code to generate and parse it. If the specification is vague, different implementations will output the same data in different ways. Extending and adding new information typically involves creating new syntax, which can cause backwards compatibility problems with old parsers.

An alternative approach to designing protocol syntax involves separating the task into two stages. In the first stage, a syntax is chosen that can be used to create generic structures not necessarily tied to the protocol. This syntax should be simple, but powerful enough to represent all potential information the protocol will want to transmit. In the second stage, the protocol-specific information is encoded using the supported structure, which can then be encoded to bytes with the chosen syntax. The result is a single syntax that can be used throughout the protocol, which only needs one parser and can be easily validated. The protocol can be extended by changing the structure of encoded information, with no need to change the syntax. Of course, care needs to be taken in both design and implementation to support future changes in structure.

Probably the best known example of such an approach is XML. XML allows the encoding of structured information, using a well-defined syntax. The structure is that of nested records that can have attributes and contain other records or text. Namespaces allow different information and schemas to be used in the same document. Here's a sample XML message:

<?xml version="1.0" encoding="iso-8859-1"?>





XML's suitability as a structured syntax for protocols depends on the requirements of the protocol. XML tends to be verbose compared to custom syntax. In most cases this is irrelevant, but in some instances, this can limit its usefulness. For example, SIP messages need to fit in a UDP datagram and are therefore limited to a rather small number of bytes. If SIP messages were encoded in XML, it's possible that they would simply be too big.

Another more important point to notice is that XML documents are formed of Unicode text (that is, a series of abstract characters such as "Uppercase letter E" or "the letter 'Dalet' in the Hebrew alphabet") and not as a series of bytes. The text is then encoded to bytes using an appropriate Unicode encoding. For protocols that involve communication between humans, the use of Unicode is an important feature. For example, the Jabber protocol is an instant messaging protocol implemented using XML. The Unicode support allows it to send structured messaging in virtually every human language. In many other cases, the fact that messages are Unicode text is unimportant or irrelevant.

Some protocols do have a problem with the fact XML is composed of Unicode text. The problem is that XML has no reasonable way of representing bytes. Since, for example, a JPEG image is a sequence of bytes, a protocol that requires sending such images will not be well-suited to a pure XML solution. There are a number of solutions to this problem, which include using a separate connection for transferring byte-oriented information (this is how Jabber sends files) and various schemes for combining XML documents with other types of structured sequences of bytes. Another small issue with XML involves cryptographic signatures, which require a canonical format for data, but because of a flexible definition, XML documents can represent the same information in a number of different ways. There are standards for XML canonicalization, but not all XML-processing tools support them.

A simpler alternative to XML are s-expressions. An s-expression is essentially a structure composed of lists of byte sequences or other s-expressions, which is to say, nested lists. While a number of syntax representations are possible, a nice example is Ron Rivest's, which are used by the SPKI (Simple Public Key Infrastructure) standard. Compared to XML, s-expressions are simpler, less verbose, and support storing sequences of bytes quite nicely, while the ability to have a structure of nested lists allows for the creation of complex data structures. S-expressions are thus a suitable replacement for cases where XML may be inappropriate. The following example consists of a list where the first item is an 11-character byte-sequence, followed by two nested lists:


There are many other standards for encoding various types of structured data, from low-level data types used in RPC systems to high-level data structures encoded on top of XML. Whether you choose one of these standards or design your own format, you will do well to use a protocol that defines its messages using a structured data on top of a simple, consistent, and unified syntax. The resulting protocol will tend to be simpler and easier to implement, easier to extend, and less likely to suffer from interoperability and security problems.