Types of Entities
August 28, 1998
Do you ever get tired of typing the name of your company, "Yoyodyne Industries, Inc."? Have you ever had the pleasure of spelling it incorrectly in an important document? Internal entities offer a convenient solution to these problems.
Instead of typing the same text over and over again, you can define an internal entity to contain the text and then you only need to use the entity where you want to insert the text. Because the entity is expanded by the parser, you can be assured that you'll get the same text in every location. The parser will also catch typos if you misspell an entity name (so long as there's no entity name that matches your typo!).
To use an entity you insert an "entity reference" into your document. You're probably already familiar with some entity references because you need to use them for special characters that cannot be typed directly in an XML document, like "<" and "&". An entity reference is an ampersand (&), followed by the name of the entity, followed by a semicolon (;).
If you've defined the entity "yoyo" to contain the name of your company, then you can use it with the following entity reference "&yoyo;".
The text that is inserted by an entity reference is called the "replacement text". The replacement text of an internal entity can contain markup (elements, attributes, processing instructions, other entity references, etc.), but the content must be balanced (any element that you start in an entity must end in the same entity) and circular entity references are not allowed.
You create internal entities with entity declarations in the internal subset or the DTD.
Five internal entities are predefined in XML:
Table 1. Predefined Entities
|Entity Name||Replacement Text|
|lt||The less than sign (<)|
|gt||The greater than sign (>)|
|amp||The ampersand (&)|
|apos||The single quote or apostrophe (')|
|quot||The double quote (")|
All XML processors are required to support references to these entities, even if they are not declared.
Character references, which are similar in appearance to entity references, allow you to reference arbitrary Unicode characters, even if they aren't available directly on your keyboard. Character references are not properly entities at all.
Character references are numeric and can be used without any special declaration.
The basic format of a character reference is either "&#nnn;" or "&#xhhh;" where "nnn" is a decimal Unicode character number and "hhh" is a hexadecimal Unicode character number.
A character reference inserts the specified Unicode character directly into your document. Note that this does not guarantee that your processing or display system will be able to do anything useful with the character. For example, ⍮ would insert, in the words of the Unicode standard, an "APL Functional Symbol Semicolon Underbar". Whether or not you can print that character is an entirely different issue.
Character references differ from other entity references in a subtle but significant way. They are expanded immediately by the parser. Using '"' is exactly the same as '"'. In particular, this means you can't use the character reference in an attribute value to escape the quotation characters.
External entities offer a mechanism for dividing your document up into logical chunks. Rather than authoring a monolithic document, a book with 10 chapters for example, you can store each chapter in a separate file and use external entities to "source in" the 10 chapters.
Because external entities in different documents can refer to the same files on your file system, external entities provide an opportunity to implement reuse. Reuse of small, discrete components (figures, legal boilerplate, warning messages) is fairly easy to manage. Implementing reuse on a large scale requires an entity management system which XML, by itself, does not provide.
A few notes about external entities:
External entities do not have to consist of a single element; you can make a sequence of three paragraphs, or even a bunch of character data with embedded inline markup into an external entity. But the tags in an external entity must be well balanced (you can't start a tag in an entity and end it in your document or in another entity).
External entities can reference internal or other external entities, but you cannot have circular references.
You can refer to the same external entity several times in a single document. Note, however, that if you do this, you will have to avoid using ID attributes in the external entity if you're concerned about validity. Using an external entity which contains an ID in more than one location in your document will produce a document that has multiple, duplicate IDs which is a validity error.
It is legal to have several external entities that all refer to the same external file.
There are no additional restrictions placed on the character encodings used by external entities. In particular, external entities with differing encodings can be used in the same document.
External entities, like internal entities, have names and are referenced in the same manner, although they are declared differently.
Internal and external entity references are not expanded in the DTD or the internal subset (this allows you to use entity references in the replacement text of other entities without concern about the order of declarations). If you want to have the effect of entities and entity references in your DTD, parameter entities must be used. Parameter entity references use the "%" character instead of the "&". Parameter entities can't be used in the content of your document; they simply aren't recognized.
It is legal to have a parameter entity and an internal or external entity with the same name. They are completely different types of entities and cannot conflict with each other.
One common use of parameter entities is in conditional sections. Conditional sections are a mechanism for parameterizing the DTD. Note, however, that you cannot use conditional sections in the internal subset of XML documents.