Menu

What Is a Schema

July 1, 1999

Norman Walsh

A schema is a model for describing the structure of information. It's a term borrowed from the database world to describe the structure of data in relational tables. In the context of XML, a schema describes a model for a whole class of documents. The model describes the possible arrangement of tags and text in a valid document. A schema might also be viewed as an agreement on a common vocabulary for a particular application that involves exchanging documents.

Schemas may sound a little technical, but we use them to analyze the world around us. For example, suppose I ask you, "is this a valid postal address?"


<address>

<name>Namron H. Slaw</name>

<street>256 Eight Bit Lane</street>

<city>East Yahoo</city>

<state>MA</state>

<zip>12481-6326</zip>

</address>

Mentally, you compare the address presented with a schema that you have in your head for addresses. It probably goes something like this: a postal address consists of a person, possibly at a company or organization, one or more lines of street address, a city, a state or province, a postal code, and an optional country. So, yes, this address is valid.

In schemas, models are described in terms of constraints. A constraint defines what can appear in any given context. There are basically two kinds of constraints that you can give: content model constraints describe the order and sequence of elements and datatype constraints describe valid units of data.

For example, a schema might describe a valid <address> with the content model constraint that it consist of a <name> element, followed by one or more <street> elements, followed by exactly one <city>, <state>, and <zip> element. The content of a <zip> might have a further datatype constraint that it consist of either a sequence of exactly five digits or a sequence of five digits, followed by a hyphen, followed by a sequence of exactly four digits. No other text is a valid ZIP code.

The purpose of a schema is to allow machine validation of document structure. Every specific, individual document which doesn't violate any of the constraints of the model is, by definition, valid according to that schema.

Using the schema described (informally) above, a parser would be able to detect that the following address is not valid:


<address>

<name>Namron H. Slaw</name>

<street>256 Eight Bit Lane</street>

<city>East Yahoo</city>

<state>MA</state>

<state>CT</state>

<zip>blue</zip>

</address>

It violates two constraints of our schema: it does not contain exactly one <state> and the ZIP code is not of the proper form. A formal definition of this schema for addresses is presented in the syntax section.

The ability to test the validity of documents is going to be an important aspect of large web applications that are receiving and sending information to and from lots of sources. If you're receiving XML transactions over the web, you don't want to process the content into your database if it's not in the proper schema. The earlier, and easier it is, to catch this sort of error, the better off you'll be. (You wouldn't want to issue someone a refund check because you allowed them to order -4 hammers, would you?)