When XML Gets Ugly

March 2, 2000

Simon St. Laurent

After being introduced by Jon Bosak as "the one who does most of the work," XTech 2000 co-chair David Megginson took a look at some of the less pleasant possibilities of XML and the "XML Web" envisioned by the W3C.

The XML Web dreams of a world where machines can read information readily from the Web. That information might be static or dynamically generated, and is referenced by a URI. This lets programs grab information using standardized protocols (like HTTP and SMTP) without developers having to go to the effort of developing for JDBC or ODBC. As Megginson summed it up: "Computers could take XML, stick it in a database, and do something cool with it."

Sadly, there are some real problems with this dream that threaten too the larger dream of the semantic Web. Megginson compared the XML Web today to the Internet in 1988, when a general lack of concern about security—the naivete of a small and friendly community—made it possible for the "Internet Worm" to shut down the Internet quickly, in the most effective denial of service attack to date.

The first relatively simple problem with the XML Web is the unpredictability introduced by XML's tools for referencing external resources, making it difficult to predict when and even if those resources will arrive. Processing self-contained XML documents can usually be done within a predictable period of time, but dependencies add an often unknown amount of time to the work. Caching and agreements about the terms for which DTDs, schemas, and other shared resources are "fresh" can help with this problem, though some level of uncertainty remains.

As he moved into more dangerous waters, Megginson noted how much the XML community trusts each other: "I take code from James Clark and Tim Bray and run it—sometimes even with 'root' privileges. That's bone-head stupid! Because we're all friends, we feel pretty good for now." The audience response made it fairly clear that David wasn't the only one doing this.

While this kind of trust is appropriate to business transactions in prearranged and typically secure relationships, there's a problem as soon as those transactions are generalized and begin to rely on shared resources. Megginson opened the discussion of what could go wrong with his "Cracker's Companion to the New XML Order."

Cracker's Companion

In the earliest phase of attacks, Megginson described ways to use cascading style sheets to vandalize sites that depend on them. Because it's easy to reuse style sheets stored on systems outside the direct control of the user, modifications made to "master" style sheets can have widespread consequences. The whiteout attack can make words disappear in critical locations, while vandalism attacks can deface or distort the content of a site.

CSS2 and XSL style sheets open even more powerful possibilities. Fortunately, though these features are extremely powerful and extremely useful, referencing style sheets from other sources is still a fairly uncommon practice. As Megginson put it, "Stupidity is often the biggest protection."

After analyzing the weaknesses of style sheet referencing, Megginson moved on to a field where external resources are used much more regularly: external DTDs. The entity lists and DTDs for XHTML 1.0, for instance, are likely to be a commonly used set of information. Many developers will use and trust them, few of whom will know much about the level of security on the W3C's servers, or on any similarly trusted repository. Hopefully, it's wonderful security, but if it's not, there are lots of possibilities for causing trouble.

The simplest attack involves adding declarations that break validation. Adding extraneous declarations to a list of character entities can effectively "break the contract" used by documents. A tiny change that produces a fatal error in a DTD could halt XML processing on a large scale. Extraneous declarations are fairly obvious, but more sophisticated tricks, like changing attributes from being optional to required, can be difficult to track down. Perhaps the most dangerous option available to crackers is redefining default values for attributes: e.g., if developers have relied on defaulted attributes for security, a relatively small change might expose enormous quantities of information.

Apart from the structural possibilities, another option, "entity spoofing," can be used to insert text into documents, vandalizing and perhaps conveying a message. Because XML 1.0 permits multiple entity declarations, and the first declaration takes precedence, it's possible to insert malicious content where an entity is used. Megginson demonstrated by inserting the Communist Manifesto in every occurrence of —!

Megginson noted that these were not particularly innovative attacks: "Compared to the way people break into computers today, this is brain-dead simple." Although he had some workarounds and solutions for keeping these crackers out, Megginson acknowledged that XML's openness might make using some of these more difficult. The simplest option, "cut and paste," copies necessary resources to local secure locations rather than relying on external repositories. This "cutting the cord" gives users control over their own resources, but reduces flexibility as well. A more sophisticated approach, replacing simple pull with a publish and subscribe model, involves more overhead and new relationships. Megginson's preferred solution, digital signatures, is the "simplest but scariest" and requires a good deal more development to make effective.

After opening our eyes to the potential for malicious misuse of XML, Megginson envisaged a call to catch the future criminal of the XML Web: