XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


A Technical Introduction to XML
by Norman Walsh | Pages: 1, 2, 3, 4, 5, 6

Appendix: Extended Backus-Naur Form (EBNF)

One of the most significant design improvements in XML is to make it easy to use with modern compiler tools. Part of this improvement involves making it possible to express the syntax of XML in Extended Backus-Naur Form (EBNF) [Section 6]. If you've never seen EBNF before, think of it this way:

  • EBNF is a set of rules, called productions
  • Every rule describes a specific fragment of syntax
  • A document is valid if it can be reduced to a single, specific rule, with no input left, by repeated application of the rules.

Let's take a simple example that has nothing to do with XML (or the real rules of language):

[1] Word      ::= Consonant Vowel+ Consonant

[2] Consonant ::= [^aeiou]

[3] Vowel     ::= [aeiou] 

Rule 1 states that a word is a consonant followed by one or more vowels followed by another consonant. Rule 2 states that a consonant is any letter other than a, e, i, o, or u. Rule 3 states that a vowel is any of the letters a, e, i, o, or u. (The exact syntax of the rules, the meaning of square brackets and other special symbols, is laid out in the XML specification.)

Using the above example, is this red a Word? Yes.

  1. red is the letter r followed by the letter e followed by the letter d: 'r' 'e' 'd'.
  2. r is a Consonant by rule 2, so red is: Consonant 'e' 'd'
  3. e is a vowel by rule 3, so red is: Consonant Vowel 'd'.
  4. By rule 2 again, red is: Consonant Vowel Consonant which, by rule 1, is a Word.

By the same analysis, reed , road , and xeaiioug are also words, but rate is not. There is no way to match Consonant Vowel Consonant Vowel using the EBNF above. XML is defined by an EBNF grammar of about 80 rules. Although the rules are more complex, the same sort of analysis allows an XML parser to determine that <greeting>Hello World</greeting> is a syntactically correct XML document while <greeting]Wrong Bracket!</greeting> is not.

In very general terms, that's all there is to it. You'll find all the details about EBNF in Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman or in any modern compiler text book.

While EBNF isn't an efficient way to represent syntax for human consumption, there are programs that can automatically turn EBNF into a parser. This makes it a particularly efficient way to represent the syntax for a language that will be parsed by a computer.

Revision History

Revision 1.1.1 18 Sep 1998 Revised by: nwalsh
Draft of update with respect to the final W3C Recommendation of 10 Feb 1998.
Revision 1.1 18 Feb 1998 Revised by: nwalsh
The title of this article has been changed. The former title was simply An Introduction to XML. In preparing this article for publication on my own web site, I've added a couple of sections that were cut from the Journal version because the content overlapped with other articles. Note: this article has not yet been updated to reflect changes that occured between the XML working draft that was current in September, 1997 and the final recommendation of Feb, 1998.