The Social Life of XML
December 23, 2003
I recently found a picture of the panelists at the XML DevCon 2001 session entitled "The Importance of XML." My body language told the story: I wasn't a happy camper. Of course I agreed with all the reasons the panel thought XML was important: for web services, for interprocess communication, and for business process automation. But I also thought XML was important for a whole different set of reasons that weren't on the conference's agenda. I thought XML was important for end-user applications, for human communication, and for personal productivity. I believed then, and I believe more strongly today, that it's a bad idea to separate those two ways of using XML.
When you get right down to it, what's really so special about web services? Is it distributed computing? Is it serialization and transfer of complex data? We've been there and done those things, though it's true that that we didn't use to do them using cheap and ubiquitous XML technologies. So, is service-oriented architecture the real game-changer? Clearly a lot of us think so, and maybe we're right. But I want to focus on something much more basic.
The really important thing, it seems to me, is the way the XML document can become a shared construct, a tangible thing that processes and people can pass around and interact with. On the one hand, an XML document is the payload of a SOAP message that gets routed around on the Web services network -- a payload that represents, for example, a purchase order. On the other hand, an XML document is the form that somebody uses to submit, or approve, or audit that purchase order. Now, all of a sudden, these two documents are not only made of the same XML stuff, they can literally be the same XML document.
When Tim Bray talks about the tribal history of XML, he says the current focus on XML data wasn't foreseen by what he calls "publishing-technology geeks" who thought they were building what he calls the "smart-document format of the future." Maybe not, but I've never been able to make much of a useful distinction between documents and databases. For me every document is a database, and every database is an assembly of documents. The "publishing-technology geeks" and the "Database Divas" that Tim writes about may cling to their tribal allegiances for a while longer, but the interbreeding experiment is already a success. I can query any XML document, including the slideshow that accompanies this talk, as if it were a database. And I can absorb XML documents into relational databases in increasingly granular and flexible ways. We're heading toward an extraordinary convergence of documents and databases. But I'm not sure we're always as clear as we could be about why this convergence is happening, or what opportunities it presents. I don't think the fact that XML has its roots in publishing is an accident -- or if it was, then it was a happy accident.
Let's imagine a purchase order flowing through a web services pipeline, sometime in the near future. It's an XML document, perhaps created with a tool such as InfoPath. The document carries core data elements -- an item number, a department code. But it also carries contextual metadata -- for example, a threaded discussion involving the requester, the reviewer, and the approver. This context is the key to understanding how the data got there and what it means.
Let's suppose Kathy, the department administrator, reminds Frank, the CIO, that Paul, the marketing guy, is way overdue for a PC upgrade. Frank pushes back: the budget is tight and something's got to give. So Paul negotiates a deal: he'll give up the DVD burner if he still have the flatscreen he asked for. But since Paul is in marketing, and he does sometimes have to burn DVDs, Frank tacks a DVD burner on to the upgrade order for Marcia, who's also in marketing. But the deal is that Marcia will have to share that DVD burner with Paul.
Today this contextual narrative is mostly scattered across a bunch of different email inboxes. It never finds its way into the operational database, although it would be great if it did. That way, the next CIO might have a shot at sorting out the environment that she inherits from Frank. But there's more than archaeology at stake here. Documents, including the purchase order and the messages related to it, aren't just passive carriers of information. They're the warp on which we weave a socially constructed reality. Somehow, we need to find ways to connect that reality to the workflow and process orchestration systems now being invented.
When I read the specs that define how these systems will work, I'm struck most of all by their treatment of exceptions. Here's how the BPEL 1.1 spec puts it: "The ability to specify exceptional conditions and their consequences," it says, "is at least as important for business protocols as the ability to define the behavior in the 'all goes well' case." I agree. But when I read these computer-sciency descriptions of compensation scopes and upward-chaining exception handlers, I worry that the we've left something important out of the picture. In our example, the exception was thrown by Frank, who asserted a veto for budgetary reasons. And it was handled by Paul, who agreed to a negotiated compromise that enabled the transaction to go forward.
This kind of scenario isn't an exception, if you'll pardon the pun. It's the rule. Everyone has an agenda; every transaction is a negotiation; and every outcome is a compromise. But the documents that help us to articulate agendas, conduct negotiations, and assess compromises don't exploit the contextual power of XML, and they aren't being woven into the web services fabric. I think that's a problem. I also think we can solve it without inventing huge amounts of new technology. Common sense, basic tools, and some elbow grease can take us a long way.
Of the various Microsoft slogans that have come and gone over the years, two in particular have stuck with me. The first, from 1990, was "information at your fingertips." In his Comdex speech that year, Bill Gates laid out a vision that's still, frankly, pretty elusive. It wasn't just about finding the information we're looking for, though that did require a leap of imagination back before Internet search came along and made it look easy. The premise of "information at your fingertips" was also that we would empower knowledge workers to interact with that information. These folks -- who we're now supposed to call information workers, by the way, because knowledge evidently sounds too elitist -- any these folks aren't just passive consumers of information, they're active creators of it. They need tools to produce, combine, transform, analyze, and exchange lots of different kinds of data, without tripping over differences in the underlying formats or editing tools.
The solution proposed at the time was compound documents with embedded active parts. Microsoft called this OLE; Apple, IBM, Novell, and Sun called it OpenDoc. You don't hear much about OLE and OpenDoc any more, and that's a shame because the problems they were meant to solve are still very much with us. I'm glad to see that WSRP (Web Services for Remote Portlets) is now tackling the problem from a web services perspective. It's a really good idea to work out how markup fragments -- and the machinery for interacting with those fragments -- can be packaged up for use on the web services network.
Back in the last century, of course, the assumption was that applications like Word and Excel were still going to control the data, and retain their own proprietary ways of representing it. The OLE interfaces would wake up chunks of that proprietary data for editing, and then tuck them them to bed again in a binary file-system-within-a-filesystem. This wasn't exactly a recipe for free-flowing data integration, but it sold some big fat programming books.
A decade after the 1990 Comdex speech, the .NET platform was rolled out with much celebration of XML as a universal data store, and with a new slogan -- "the universal canvas" -- that I absolutely love. It's an idea that makes intuitive sense to everyone. Science fiction writers have always imagined what this would be like. The best demonstration of the concept I've seen is a 1987 concept video produced by Apple, called Knowledge Navigator. When I mentioned it on my weblog last month and posted a link to the video, it attracted a huge amount of interest. We all have a deep conviction that networked computers are supposed to help us create and inhabit shared collaborative spaces where we can fluidly manage relationships, create and reuse information, and conduct business transactions.
Those transactions are governed by business protocols that we're working hard to formalize and automate. I don't want to trivialize the effort that's going to require. It's a deep problem and there's a lot we still don't know. Take, for example, the question of schemas. Some really smart people, including Jon Bosak, think we'll need a Universal Business Language to connect business protocols across different vertical-industry domains. Some other really smart people, including Jean Paoli, are tackling the problem from the bottom up, on the assumption that schemas need to emerge from specific practices before they can be codified in the large. I'm sure there's no simple answer, and I expect that both approaches will usefully coexist. But no matter how this plays out, the schemas and protocols are just the skeletal outlines of business processes. The flesh on the bones is the context that we create as we participate in these processes.
Weblogs are arguably the best examples we have of XML connecting people to other people in information-rich contexts. But while the glue that holds the weblog universe together is an XML application called RSS, it's really only a thin wrapper of metadata around otherwise opaque content. The RSS payload typically isn't XML, it's escaped HTML -- a practice that Norm Walsh calls an abomination. I think Norm is right to say that. So my own RSS payload, like a few others out there, includes namespaced XHTML. But the gymnastics I that have to perform, in order to create that payload, are another kind of abomination.
We've waited a long time for XML-aware authoring tools that fit easily and naturally into the flow of the Web. Although this was the year in which Microsoft shipped an XML-aware version of Office, the sad truth is that it was still easier for me to create my presentation in Emacs, rather than in PowerPoint, or Word, or InfoPath.
Having said that, InfoPath, in particular, does get a number of things very right. It enables a relatively non-technical person to invent a schema, create a form that captures information that's valid with respect to that schema, and distribute the form to completely non-technical people who can fill it with data. What's more, the form, or document -- it's hard to know just what to call it -- has exactly the dual nature I've been talking about. Its information payload can be detached from a web services pipeline, edited offline by Kathy, emailed to Frank, edited offline by Frank, and injected back into the web services pipeline using email, or an HTTP postback, or a WSDL call.
The email client is another way of packaging those components. And unless spam completely kills it, email is going to keep on being a primary lubricant of our business processes. Email is where most of our contextual information is created and exchanged, but where none of XML's contextual power is brought to bear. Here, by the way, Microsoft completely dropped the ball. The only Office 2003 application in which users can't create and use XML content is Outlook. But that's precisely where the need is greatest. Every day we ask questions about who said what, to whom, in reference to what, in response to whom. Because none of our routine written communication is well-formed, we fall back on decades-old search and navigation strategies in order to find things. And what we find is typically a mess. It's amazing to watch a highly-paid professional spending billable time trying to untangle what we like call an "email thread," but what's really just a patchwork quilt of mangled fragments with no discernible order, structure, or attribution.
The problem with routine and casual use of well-formed content, of course, is that the XML parser is designed to keep the barbarians at bay. If the parser smells even a whiff of trouble, it slams the gate shut. As well it should. We wouldn't be having a web services revolution, right now, if we encouraged the kind sloppiness that's rampant on the Web. But we do need to find ways to make it easier for the barbarians to become respectable citizens. We have these liberal parsers that browser developers have spent untold effort creating, parsers that can slurp up the worst kind of tag soup that comes pouring out of HTML composers, or is written by hand. Maybe we can get more mileage of them.
It's easy to just dismiss the barbarians, but there are an awful lot of them. They're creating and sharing tons of content that isn't well-formed, but in many cases we could squint and pretend that it is, just as browsers do. If we did that, we might be able to make the information they create and exchange more useful to them, as they performt the business scenarios we script for them. And we might also be able to make the information more useful to us, as we try to manage and debug those scenarios.
I think that the combination of XHTML, CSS, and XPath adds up to a fruitful opportunity, even at this late date. Back in 2001, at that other convention I mentioned, somebody asked Tim Bray when XML would replace HTML on the Web. Here was his answer:
Nobody thought for a microsecond that HTML would be replaced, and I don't think HTML will be replaced in my lifetime. It is the single most successful document format ever devised for delivering information electronically to humans. The population of computer users has voted for it overwhelmingly. I like it, I use it, I can't see why you'd want to stop using it.
I completely agree. And since we are going to keep on using HTML, it behooves us to find smarter and better ways to use it. XHTML is one of those smarter and better ways.
CSS is another. It strikes me as a really interesting opportunity to smuggle metadata into documents. People who don't know or care about metadata will nevertheless spend a lot of time fiddling with styles because they care a lot about how their documents look. A friend of mine, who's a teacher, told me that it takes her much longer to make presentations now, in PowerPoint, than it used to when she wrote them by hand on overhead-projector transparencies. There's a powerful human urge to achieve the right style. So let's exploit that. Let's promote packages of style tags that people will use just because they want to look cool. That's the immediate payoff. They don't need to know that those style tags are also hooks that make it easier to search for and manipulate content. Then, let's give them XPath-enhanced document viewers that do useful things with those hooks -- that cut down on the hassle and frustration of finding and reusing stuff. There's nothing earth-shattering here. It's just a modest proposal that aims to make better use of the tools and technologies already in place. Given the amount of hassle and frustration that's experienced by everyone on a daily basis, though, it's the kind of thing that could add up to a big payoff.
It's also time to get serious about using XML to capture and represent real-world context. The XML and web services communities are doing a good job of reducing friction at the interface between processes and data. I'm pretty sure we can solve that eventually because it's the kind of problem that we, as technologists, are good at solving. We like to think about protocols and formats.
I'm not sure we'll do such a good job of reducing friction at the interface between people and data-driven processeses. Success there will require serious attention to how people connect with one another, and with data, in information-dense, event-driven, networked environments. That means thinking about "human factors" and the "user experience" -- a couple of awkward phrases for things that we, as technologists, are not very good at dealing with. We don't like to think about habits, or agendas, or ways of thinking, or modes of communicating.
More from Jon Udell
Fortunately, there's all that publishing DNA floating around in the XML community's gene pool. We've only got a few decades of experience with networked information systems. But we've got a few millenia of experience with documents. Let's use that to our advantage as we build out service-oriented architectures in which documents are both payloads and user interfaces. From a publishing perspective, we know a lot about how to build documents that capture and hold attention, establish historical and current contexts, and tell stories that help people understand themselves in relation to those contexts. We need to draw on all that publishing knowledge as work out how to connect people to data-driven processes.
Here's another idea. The emerging web services network is radically open -- not only because the messages exchanged on that network are XML, but also because the services are connected using pipelines. We can inject intermediaries into those pipelines; the intermediaries can observe and act on the messages. So we can acquire a lot of useful context, and can implement useful policy, by reading and writing what goes by on the wire. Things don't tend work the same way on the desktop, but maybe they could. Our personal productivity tools are in a position to learn a lot about how we interact with remote services, communicate with other people, and manage our data. And they're in a position to help us do those things more effectively. But the messages and events flowing on our local machines have nothing in common with the messages and events flowing in the cloud.
For a long time I've thought that if we could bring these two worlds closer together, we could achieve powerful synergies. The idea got a boost recently when Microsoft revealed its plans for Indigo, the communication subsystem in Longhorn. Indigo aims, among things, to make XML web services efficient for use across -- or maybe even within -- local applications. I invite you to think about what that could mean, not only for Longhorn but for all platforms, and not only in three years but also right now.