Why Are External Entities Included Optionally?

In discussion of external entities, we realized that the semantics of external text entities (compulsory inclusion at the point where they are encountered) are deeply incompatible with the desired behavior of Web browsers. Consider the following example of the beginning of an XML document:

<?xml version='1.0'?>
<!DOCTYPE doc [ <!ENTITY MSA SYSTEM "http://www.microsoft.com/press/311.xml">
<!ENTITY NSA SYSTEM "http://home.netscape.com/PR/x27.xml">
]>
<doc>Netscape today announced that &NSA;. In response, Microsoft issued the following statement: &MSA;.
...

A Web browser is typically making an aggressive effort to display text to the user as soon as possible, in parallel with fetching it from the network. In the example above, if a browser were required to fetch and process all external entities, it could only display the first four words before starting another network fetch operation. To make things worse, bear in mind that the replacement text for the entity NSA could well include other external entities which in turn would need to be fetched.

This type of situation is unacceptable. Hence the rule that non-validating parsers need not fetch external entities if they don't want to.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.