Eternal Refactoring

July 7, 2004

Edd Dumbill

Since handing the reins of to the capable and erudite Kendall Clark, it is a pleasure to assume the mantle of XML-Deviant columnist once again. I'll be summarizing discussions and happenings from the XML developer community, centering on the XML-DEV mailing list and the associated world of XML and web standards.

As ever, I'll be more than happy to hear your feedback, which you can leave in the forum at the bottom of the page, or send to me via email at

What Price Sanity?

Roger Costello dangled an irresistible conversation starter in front of the XML-DEV regulars, the question of whether progress in XML is best served either by simplification or added complexity. The canonical example of simplification driving progress is, of course, XML being a simplified SGML. Yet past this point of simplification, and few would deny this, has emerged a baffling array of complexity.

Costello presented two theses:

Approach 1 -- Progress via Simplification

With this approach the attitude is: "What is the simplest collection of components needed to achieve all the complexity required?" Interestingly, this approach strives for greater complexity by removing complexity.

I think that typically the right collection of components is not found on the first attempt. Typically, the first attempt produces a collection of components that are too complicated. So successive versions of the technology result in simpler components. But these simple components can be assembled to produce results that are as complex (or more so) than the earlier components....

Approach 2 -- Progress via Complexification

With this second approach the attitude is: "The existing functionality does not give users all the desired complexity, so let's add more functionality." Thus greater complexity is achieved by adding more complexity.

As I look at the next version of some of the XML technologies it appears that this second approach is being taken. For example, with XSLT 2.0 and XPath 2.0 you are able to accomplish what was extremely difficult (or impossible) in 1.0. However, this enhanced complexity is achieved by adding more complexity to the language. I believe that XML Schemas 2.0 is going along the same path -- more complexity by adding more complexity....

Costello's closing challenge to readers is to ask whether, instead of adding complexity, the 2.0 versions of XML technologies should instead aim for simplicity, and in fact be refactored versions of their former selves?

Norm Walsh saw a direct mapping from Costello's approaches to the XML world.

Approach 1 -- Progress via Simplification ... This is standardization. Standardization simplifies an existing technology by blunting some of the sharper edges and knocking off some of the larger burrs. No new problems are solved by standardization.

XML 1.0 was achieved by standardizing SGML.

Approach 2 -- Progress via Complexification ... This is design by committee. Design by committee increases the complexity of whatever technology it starts with (be it a new idea or of an old idea) because it struggles to solve new problems in ways that achieve the process goal of "consensus."

XPath 2.0 is being achieved by extending XPath 1.0 by design-by-committee.

However, Walsh declines to come down on the side of either, claiming both have value.

I think design by committee achieves non-technical goals (agreements between competitors about what to implement, attempts to reduce the number of things that customers have to learn, reducing the risk associated with implementation costs) that standardization can't. In order to standardize, you have to have an existing technology. We want to achieve new technologies without sacrificing the non-technical goals that design by committee achieves, so we're stuck with a cycle of simplification followed by complexification.

Mike Champion doesn't see it so clearly, however. First, he says, "simplicity" isn't simple, breaking down into at least two dimensions: "minimalism (how many simple concepts are there) and easiness (is it simple to use by the target audience)." He notes that seemingly elegant and simple theories can become complex for users, because they require operation in the realms of abstractions rather than more concrete ideas.

For example, consider the relational model of data. It is simple in the elegant sense, but requires implementers and users of DBMS systems to deal with abstractions such as "normalization" that stand between ordinary reality and the theory.

Champion concludes that there's a tension between technologies that are simple and easily implementable, yet soaked in abstractions, and those that are complex and clumsy, yet have many tools for solving real-world problems. This is a tension we are stuck with, he says: "Eternal refactoring is the price of sanity."

Material Concerns

Also in XML-Deviant

The More Things Change

Agile XML


Apple Watch

Life After Ajax?

Zooming over to the RDF Interest list, I will spare you an extended discussion on the topic of identity, and instead focus on the altogether more tangible.

Some may consider Semantic Web developers to be very much concerned with the abstract, but a recent thread shows that good old materialism is as good a driver for progress on the Semantic Web as anywhere.

Many of us, no doubt, employ the wishlist facility on Amazon to communicate our birthday needs to distant relatives. The decentralization of this seems like a natural target for RDF-savvy developers.

Ryan Shaw asked if there's an RDF Schema or OWL Ontology for wishlists. In response, Danny Ayers mentioned recent thoughts from Dan Brickley on adding wishlist features to his Friend-of-a-Friend (FOAF) vocabulary. Brickley's examination of the subject shows there's more to wishlists than simply wanting gifts. He explores four notions of wishlists that have a bearing on such vocabulary design.

Sense 1: Wishlists as true descriptions ... This notion of a wishlist can be characterized as a relationship between a Person (or Agent) and a description of the world. The idea is that the "wish" is for the description to be true. Any RDF description of the world can be used.

Sense 2: To own particular things.

Sense 3: An information-oriented expression of interest ... Somewhat different, this one. Idea is that we might want to express informational wishes; for example, we might want to leave "questions" in the Web, and have services answer them, notifying us of the answers (via email, RSS/Atom, etc.).

Sense 4: Wanting to own things that match some particular template description ... the implicit assumption is that for each thing on the list, there is at most one desired object that matches the description (e.g. some particular book with ISBN and title). However, in practice there are often multiple objects that meet a description, and it is actually quite difficult to constrain item descriptions so that there is only one possible match.

On the topic of Amazon wishlists, Ayers pointed to an XSLT stylesheet from Morten Fredriksen, for directly turning Amazon's web-service-accessible wishlists into RDF for inclusion in a FOAF file. Jon Hanna also pointed to Christopher Schmidt's considerations of the nearby concept of the FOAF tip jar, an RDF expression of the "donate" links often used by people to enable the public to contribute to some worth endeavor of theirs.

Express Yourself at XML 2004

The XML 2004 conference is holding another art show this year. Eve Maler issued a call for participation.

This is an early call for all the "creatives" out there to get started creating a masterpiece that can be shown in the XML 2004 exhibit area. We're looking for pretty much any piece of art that both has an XML theme (be prepared to make your case for the connection...) and is portable enough for you to bring to the conference site in Washington, D.C. in mid-November.

Previous contributions included extracts from the XSL-FO spec set in calligraphy, hierarchical quilting, and handmade paper printed with XML-derived bar codes. Having seen Simon St. Laurent's excellent forays into the animation world, I wonder if further advances into performance art will be considered this year. XML Schema interpreted through mime and contemporary dance, anyone?

Births, Deaths, Marriages

The latest announcements from XML-DEV.

Sun Pitches Hat into the Binary XML Ring

Bob Wyman reports that Sun has published papers describing "a binary format for XML infosets that is an efficient alternative to XML."

Vtd-XML Released Under GPL

Version 0.5 of an extractive-parsing, XML-processing API for Java.

'Fantastic' XML editor for OS X

Superlatively titled XML editor for OS X. Looks to be in its early stages.

Windows XML Standards Library

Windows HTML Help versions of XML specs: adds XInclude, xml:id, and other documents.

Stylus Studio 5.3

XML IDE adds XML-differencing tool.

Excelsior! Path Preview 0.1

Open source XPath implementation for Mac OS X's Cocoa API. (What is it with these excitable OS X types?)

Oracle XML Developer's Kit v.

For AIX, HP-UX64, and Linux. Almost as many new features as dots in the version number: JAXP 1.2 support, C++ XML APIs, XMLSAXSerializer.


It doesn't matter, they can't hear you ... What is the difference between the XML Infoset and Canonical XML? ... Messages to XML-DEV this week, 73. Len rating, 11% ...

Mixing way too many metaphors: If your iPod has duct tape on it, it won't get you laid ... A noble but flawed endeavor: Asking XML-DEV which API is best. Why, mine, of course!