Given the preceding discussion of type declarations, it follows that some documents are valid and some are not. There are two categories of XML documents: well-formed and valid.
A document can only be well-formed [Section 2.1] if it obeys the syntax of XML. A document that includes sequences of markup characters that cannot be parsed or are invalid cannot be well-formed.
In addition, the document must meet all of the following conditions (understanding some of these conditions may require experience with SGML):
- The document instance must conform to the grammar of XML documents. In particular, some markup constructs (parameter entity references, for example) are only allowed in specific places. The document is not well-formed if they occur elsewhere, even if the document is well-formed in all other ways.
- The replacement text for all parameter entities referenced inside a markup declaration consists of zero or more complete markup declarations. (No parameter entity used in the document may consist of only part of a markup declaration.)
- No attribute may appear more than once on the same start-tag.
- String attribute values cannot contain references to external entities.
- Non-empty tags must be properly nested.
- Parameter entities must be declared before they are used.
- All entities except the following: amp, lt, gt, apos, and quot must be declared.
- A binary entity cannot be referenced in the flow of content, it can only be used in an attribute declared as ENTITY or ENTITIES.
- Neither text nor parameter entities are allowed to be recursive, directly or indirectly.
By definition, if a document is not well-formed, it is not XML. This means that there is no such thing as an XML document which is not well-formed, and XML processors are not required to do anything with such documents.
A well-formed document is valid only if it contains a proper document type declaration and if the document obeys the constraints of that declaration (element sequence and nesting is valid, required attributes are provided, attribute values are of the correct type, etc.). The XML specification identifies all of the criteria in detail.