XML E-Business Standards: Promises and Pitfalls

January 5, 2000

Robert Worden

The Promise

The greatest driver of business change today is e-business, brought about by the Internet. In the near term, this revolution will have the most impact on business-to-business transactions. Recent announcements by Ford and General Motors that they both intend to use e-commerce for all their supply procurements—in Ford's case worth $80 billion per annum—have been noted by some commentators as signaling the day when e-business came of age.

To conduct e-business transactions, companies need a common language through which to exchange structured information between their computer systems. HTML, the first-generation language of the Internet, is not suited for this task as it defines only the formatting of information, not its meaning. Enter Extensible Markup Language—XML. Like HTML, XML consists of text delimited by tags; so it is easily conveyed over the Internet. In XML the tags can define the meaning and structure of the information, enabling computer tools to use that information directly.

XML has been embraced enthusiastically by all the major IT suppliers and user groups. Its standardization and rapid uptake have been the major development in IT over the past two years. Industry rivals like IBM, Microsoft, Sun, and Oracle all support the core XML 1.0 standard, are developing major products based on it, and collaborate to develop related standards. XML is now the world standard platform for e-business transactions. However, in any business application, XML itself is not the answer. It is only a standard foundation on which answers can be built. In that lack of prescription lies both the power, and the danger, of XML in e-business.

The Pitfalls

XML today is in a position similar to that of relational databases twenty years ago. The relational data model does not pre-define how you will store your data; it gives you a standard foundation, leaving you to choose how to use it to store data. You must define the tables and columns. What relational databases do for information storage, XML does for transmission of information over the Net. XML does not of itself define how information will be structured, or what it can mean. You define the tags and their semantics, giving structure and meaning to the data. Both XML and relational databases provide a framework for organizing data in a simpler, more flexible, and more powerful way than ever before.

To understand the future impact of XML, we must understand the past and present impact of relational databases. Because relational databases are so much better than what went before them—simpler, more powerful, more accessible— when they arrived around 1980 we started building many more new databases. New applications with new databases proliferated, grew, and became indispensable to their creators. Then we began to see the confusion we were creating.

As companies built tens and hundreds of new databases, the same data were stored in many different databases with redundancies, overlaps, and inconsistencies. The result today inside many big companies is information chaos—a corporate data spaghetti of system-to-system links for data exchange, application integration costs as high as 40% of the total IT budget, and, above all, delays in building vital new applications.

The problem with system-to-system interchanges between relational databases is this: If you have N databases, the number of possible data interchange links can grow as N squared. With even thirty different databases—and most companies have far more than that—there are nearly 1000 possible links. If even a small fraction of these interfaces have to be built, maintained, and understood, the resulting complexity will be unmanageable. Many companies have lost this data complexity battle and are paying a heavy price. Twenty years after the start of the relational era, we have still not solved the complexity problems it created.

The potential pitfall of XML is this: Through widespread use of XML, we will create a greater data complexity battle—this time across whole industries rather than individual companies. And this time when we lose the battle, we will suffer even greater consequences.

As companies first venture into e-business, this may not seem to be a big problem. Surely, they say, we can adopt one of the emerging XML-based message standards—such as FIN-XML for financial transactions, or C-XML for diverse commercial transactions—and simply build the interfaces between this message standard and our own core systems? Any one of these XML-based e-commerce standards is as complex as a medium-large relational database schema. Building interfaces between one of these standards and one of your company's large IT systems is a substantial piece of work—but feasible.

Unfortunately, there is not just one XML-based standard emerging, but many—for different industry sectors, and even several within the same sector. There are already XML-based standards wars between different industry groupings. It will not be possible to interface to just one of the standards, for many reasons: your company will not be operating in just one market sector; one standard does not address all your business needs; your business partners may back different standards; standards wars will continue, and you need to back the winners. As the new standards are used, they will grow and evolve. You will have to build and maintain interfaces between multiple systems and multiple XML standards just to stay in e-business.

XML-based message standards are today proliferating across the world, just as relational databases proliferated from the 1980s within individual companies. The result will be the same data confusion seen before, this time played out on a worldwide scale. Each company will have a patchwork of application databases (as before) and interfaces to many different XML message formats. The costs of interfacing packages, business processes, and legacy systems to the many XML standards are multiplicative, and may soon be the key inhibitor to your company exploiting new business models and processes.

This is a complexity trap at least as large, and as dangerous, as the complexity trap in multiple relational databases. In twenty years we failed to solve the relational complexity trap. How will we fare with the much bigger XML complexity trap?

There are two possible ways forward—to back some supra-standards repository "framework" such as Microsoft's BizTalk, or to manage the XML interfaces properly within your own company. While they are not mutually exclusive, I shall discuss them separately.

BizTalk to the Rescue?

The BizTalk framework, promoted by Microsoft and partners, aims to make it easier for individual companies to mix and match XML message formats from different vendors and standards groupings, picking out the sets that best meet their business needs and application mix.

It does so in three ways. First, it sets out a "canonical form," in which any application-specific set of XML message formats can be defined. Second, it provides a public repository at where BizTalk-conformant XML message format sets can be validated, lodged, retrieved, and freely used. Third, the creators of BizTalk-conformant message standards are encouraged to lodge XSLT-based translations between their own formats and others' standard formats.

The theory is this: your company subscribes to BizTalk-conformant standard A, so you build interfaces from your IT systems to that message format. Your business partner subscribes to a different standard, B, which is also Biztalk-conformant. Using the XSL translation between A and B (available from the Biztalk repository), you can send XML messages in standard A, which your partner can then translate from A to B and understand your messages. If enough standards come under the BizTalk "umbrella," you can freely exchange messages with any business partner who uses it.

Is this, then, the way forward for your company? Should you bet that the industry strength of Microsoft and its partners is enough to drive all the package solutions you and your business partners need into the BizTalk framework? Can you just format messages into any BizTalk-conformant standard, and then rely on BizTalk translations to do the rest?

The history of relational databases suggests not, because the BizTalk translation framework does not solve the N-squared problem. Just as for databases, if there are N different "standard" message formats, up to N(N-1) translations may be required. Currently the number of XML-based message standards defined by industry groupings (N) is well over 100, and growing. If you were the creator of one of these XML-based standards, would you spend the time to understand all the other standards—your competitors'—in enough detail to create and maintain all the necessary translations? It's enough work just to keep your own XML formats abreast of changing business needs; maintaining thirty or fifty XSLT translations as well would be a massive extra workload, for a limited payback.

For this reason, do not hold your breath for plug-and-play application compatibility via BizTalk XML. And do not bet your corporate IT strategy on it. If you are to avoid the N-squared trap of incompatible message formats and legacy systems, the solution lies within your own company.

Your Own Gold Standard

There is a viable way forward for individual companies, which is to take control of the problem yourself. The key is in this observation: nobody else is in exactly the same business you are. Therefore you need to build a single technology-independent logical model of all the information needed to drive your business—your own gold standard for business information—and then map all the different technology pieces (your own IT systems and external XML message formats) onto that logical model. Define message formats and translations from the logical model. Any data translation between system A and message format B is done not directly, but in two steps via the logical business model. By doing every data translation in two steps, you will abolish the N-squared complexity barrier. For each new IT system or XML message format, you only have to define one translation (to your logical business model), rather than N translations to other systems and message formats.

The steps required to do this are:

  1. Build a single logical model of the information needed to drive your business.
  2. Map your main IT systems and XML message formats onto the logical model.
  3. Define common XML message formats based on the logical business model.
  4. Define XML message translations into and out of the common message format.

Having done this, you can then translate between any two data models or XML message formats via the common message format. The hard steps here are (1) and (2); once they are done, steps (3) and (4) are largely mechanical. Tools and techniques are available to help you through these steps, which have been proven to work for large, complex enterprises. If you can succeed in this endeavor, the prize is well worth having—a coherent information architecture, insulating your company from a growing industry-wide data spaghetti and enabling you to adapt rapidly to new business models and data needs. In the era of e-commerce, the winners will be those agile companies who can move rapidly to new successful business models. A coherent, understandable corporate information architecture is the key to that agility.

The Way Forward

The BizTalk initiative is not the only attempt to bring order to the proliferation of industry sector XML applications. Others, such as the UN-backed ebXML initiative and the XML/EDI group, are proposing public repositories of XML message definitions. By doing so, they are attempting to link all definitions to a common business vocabulary so as to ease the N-squared translation problem.

Each of these "supra-standards" efforts will help to manage the complexity of different XML dialects. However, as with the XML dialects themselves, it is not at all clear which repository initiative will establish the most early momentum, or win out in the end. None of the XML repositories will solve the N-squared translation problem for all businesses, unless it can establish a common model of all business information, agreed between all parties, which can then act as an interlingua for all XML translations. The chances of such a massive information model being developed consistently and completely, agreed across all countries and industry sectors, and then maintained effectively, are remote.

Therefore no company can afford to wait for these cross-industry initiatives to succeed. Instead, each company can establish its own model of business information based on its own business needs, and perform all XML translations via this model. This is a feasible undertaking, which can start to deliver results in just a matter of months. These models will vastly simplify the problems of interfacing with a changing and unpredictable outside world. At the same time, this course of action does not preclude you from taking advantage of BizTalk or any other XML repository initiative, if and when a winner emerges.