Business at XML 2002

January 8, 2003

Alan Kotok

XML Refocuses

The XML 2002 conference and expo (8-13 December 2002), this year's IDEAlliance showcase, reflected the impact of the technology recession on XML business applications. With many business customers cutting back on new technology investments, XML vendors now take a greater interest in government clients and offer their tools to help organizations integrate current applications as well as build new ones. This focus on government and integration came through repeatedly during the conference.

The subdued atmosphere of the meeting, held at the Baltimore (Maryland, USA) Convention Center was most evident in the expo hall, which had a sound and feel more like a public library than the raucous midways of past years. Just as libraries often hold valuable intellectual gems, many conference sessions had important messages about commercial uses of XML, particularly XML's ability to handle the increasing complexity of business, another recurring theme.

XML For, Of, and By the People

The changing nature of the market, along with Baltimore's proximity to Washington DC, which is about 30 miles (50 km) to the sorth, gave the U.S. government a major presence at the XML 2002 conference. Many presentations, both in the general and track sessions, discussed either XML's use in government systems, solutions funded by federal agencies, or applications mandated by government regulations.

In the opening keynote, Robert Haycock, who serves as acting manager of e-government in the OMB (Office of Management and Budget, the central budget and policy agency in the executive branch) laid out the proposed Federal Enterprise Architecture and the role of XML in that architecture. Haycock said that the current e-government initiatives, arrayed in a series of 24 projects, represent functions matching citizen needs and services, and that cut across the traditional agency boundaries. Haycock said XML will play a key part of the architecture, due to its ability to provide a neutral medium for information sharing and to develop a common framework for delivery of services, independent of platform or vendors.

One example cited by Haycock is Pay.Gov, a service of the U.S. Treasury Department, but also part of the larger e-government initiative on federal asset sales. Haycock said Pay.Gov uses a component-based design for collections, forms submittals, bill presentment, authentication, and agency reporting. The forms submission and bill presentment functions use XML and are the parts of Pay.Gov with which the public interacts.

The initiatives outlined by Haycock described efforts by central agencies like OMB to exercise overall direction for XML in the government, but other presentations showed how agencies, particularly the Department of Defense (DoD), are moving head with XML for business and publishing applications.

A working DoD use of XML involved a large-scale content management application in the U.S. Navy. Jon Parsons of XyEnterprise discussed XML's role in managing the large and detailed database of maintenance data that covers the Navy's entire fleet and all of its contents. Not only is the scale of this endeavor enormous -- nearly half a million separate pages of maintenance requirement cards (MRCs) are represented in the database -- it is constantly changing, growing, and needs to be published in several different print and on-line media.

The Navy had already recognized the structured nature of the information and started a few years ago using SGML for this application. Parsons said that with the coming of XML, the Navy could integrate its existing base of SGML files with the new XML documents. The production process operates on a regular schedule of updates every six months but also provides event-driven updates generated by changes required by circumstances, such as engineering research data or field experience.

Parsons cited return-on-investment (ROI) statistics, such as reducing the cost of document updates by 50% and reduction of time needed to perform an ad hoc update from eight weeks to five days. But the ultimate ROI in this case is the fact that XML (and earlier SGML) made this capability possible for the Navy, which uses off-the-shelf software, based on open standards, to publish the maintenance data. Without XML and SGML, the Navy would have likely had to develop its own maintenance publishing system at a much higher initial cost and probably requiring much more care and feeding.

The conference sessions had other productive uses of XML to solve specific problems in government and improve the way agencies serve their customers. Few operations in government are done on a small scale. The Census is no exception. Steven Schafer of Fenestra Technologies, a contractor to the U.S. Census Bureau, described the development of an XML-based graphics vocabulary to help conduct the 2002 Economic Census.

The Census Bureau conducts this survey every five years, which covers all basic business activity in the country and provides important baselines for policy decisions. The survey covers the five million businesses in the United States, using some 650 different survey forms, with each form running 10-12 printed pages. In five years's time, the nature of business changes to such an extent that many of the forms needed redesigning.

In addition to the large volume, the Census Bureau had important business and production goals. The time needed for forms redesign was too long, which hampered the work of the business domain experts who created the forms. The Census Bureau wanted to get the process as close to real time as possible; it also needed a common repository for the printed and online forms that store the content and layout data.

The forms themselves needed strict visual fidelity, which means a document will be rendered identically on any output device. Census's experience with survey forms indicates that even subtle differences in rendered output can affect the responses collected by the surveys; inadvertent line breaks, for example, are unacceptable.

Schafer said Fenestra needed to write a new XML graphics vocabulary because the Extensible Stylesheet Language Formatting Objects (XSL-FO) specification was still in preparation. The XSL-Fo draft offered at the time did not appear to offer enough precision for this work. The Survey Formatting Objects (SFO) vocabulary developed for the job used some of the same principles as XSL-FO, such as the separation of layout from content, but SFO provided more control for the forms designer.

SFO exercises control over the two-dimensional flow of forms, allowing for precise placement of the objects. It divides the form page into regions, with the different regions stacked on the page. SFO also allows for the nesting of regions and the use of different borders in the different regions. Schafer reported that the use of XML helped improve the exchange of the forms data among Census operations, but they ran into some problems with the heavily nested forms that made it difficult to track the flow of the form.

Joe Carmel of the U.S. House of Representatives described (with his colleague Cindy Leach demonstrating) XML's use for drafting legislation, a topic XML.Com reported on last May ("Can XML Write the Law?"). The drafting of legislation involves a legalistic and traditional environment, but with the volume of legislation increasing -- some 3,600 bills a year -- and legislation getting more complex, the U.S. Congress needed a solution that would improve the old process while maintaining its current base.

The Congress has a history with automation, including the use of automated typesetting equipment in the Government Printing Office (an agency of the Congress) and like the Navy, has used SGML. Because legislation follows a specified structure, it lends itself to markup languages. But before XML, the main tool for drafting legislation was a DOS-based text processor that required the attorneys drafting legislation to learn typesetting codes -- perhaps not the best use of a lawyer's time.

The presence of XML, based on SGML but simpler and less expensive to implement, encouraged a solution that would make automation easier for the users. As Carmel explained, the Congress needed a WYSIWYG UI to hide the typesetting codes and offered templates to present the structure of the documents but hide the underlying schemas.

The solution used customized versions of Corel's XMetal product that offers a WYSIWYG screen and templates. The system (implemented in 2001 and 2002) also allows Congress to add in the management support for tracking cross-references to other legislation and improve navigation through the bills, which can become complex as they move through the legislative process. Most importantly, it lets lawyers be lawyers and not typesetters or computer experts.

Government Integration

Many of the business sessions talked about ways XML can help companies better share data among different applications and vendors, a use of XML particularly suited to web services. Don Box of Microsoft, one of the general session speakers and an author of the original SOAP specification, discussed the tension between business-to-business or inter-organizational architecture on one hand and integration architectures on the other. The inter-organizational architecture uses a coarse-grained, loosely-coupled model, reflecting the Internet and Web. The enterprise integration architecture was meant to stay within organization boundaries and requires a tighter coordination.

One of the track sessions later in the day reflected this tension between internal integration and business-to-business exchanges, almost as Box had described it. Al Gough of AMS discussed an API based on XML and web services designed for DoD procurement. Kevin Mitchell of Agogo Networks, who also worked on the project, gave part of the talk.

AMS had already developed a procurement API for DoD but needed to open up the architecture to make it more suitable for department-wide use and public interactions. The original architecture used tightly-coupled components, but the public API specified a more open and standards-based design, directly usable without special tools. As with any DoD systems, this API needed to be scalable and secure, supporting PKI for encryption. And AMS required that the payloads use XML.

Gough said these requirements suggested a web services solution. Besides supporting an open and standards-based architecture, web services also allowed for clean deliveries of the XML payloads. Gough explained how the architecture used a series of layers, but designed the different layers to interact with each other, allowing for direct exchanges among the components as needed.

For example, consumer (non-procurement) applications can interact with the procurement system via either the internal or public APIs, working through an enterprise application integration (EAI) adapter. In some cases, however, the external applications can do business with the procurement system through the public API alone, bypassing the EAI adapter. The EAI layer will allow for integration later on with ebXML or BizTalk. To handle these various contingencies requires a more open architecture than before, based on standards.

Gough said the business payloads are based on XML DTDs and cover inbound and outbound transactions such as requisitions, RFPs, offeror responses, awards, application advice, closeouts, and milestones. For semantic metadata harmonization, the project adopted the Universal Data Element Framework (UDEF) used in the aerospace industry. UDEF helps identify the semantics of elements defined in a schema, such as a vocabulary written for a specific industry. With UDEF, one assigns a neutral code to the metadata, which then provides a means of translating metadata across industries.

A general session earlier in the program showed that for some organizations, even internal integration needs an open architecture. Stephen Katz of the U.N.'s Food and Agriculture Organization (FAO) and John Chelsom of CSW Group in the U.K., a consultant to FAO, talked about the FAO's need to pull together the work of its disparate programs under a single World Agricultural Information Centre (WAICENT). This project, according to Katz and Chelsom, directly affects FAOs ability to help UN member governments modernize their food production and distribution activities.

As Katz and Chelsom described it, FAO had the integration project from hell. Unlike the DoD systems described by Gough and Parsons, where a tradition of top-down command structures can help enforce policies, FAO (and the rest of the U.N.) works in a tradition of consensus-based decisions. WAICENT had to pull together data from 200 different sources generated by hundreds of individual development groups within the FAO organization. But FAO still had to speak with a single voice to the world and could not tolerate a situation where asking the same question to different parts of FAO elicited different responses.

The decentralized nature of FAO meant more than integrating different applications. It also meant reconciling differences in data consistency and conventions, as well as various data storage and publishing formats. Moreover, according to Katz and Chelsom, FAO has no common underlying computing architecture, with some users developing with Microsoft tools and others working in a J2EE environment. And just to make the work really interesting, FAO operates in five different languages: English, French, Spanish, Chinese, and Arabic, with Russian soon to be added.

The project team started with an application that FAO calls its Country Profiles. The application draws on several different internal FAO databases, as well as external data sources (e.g. World Bank and BBC) to provide country-specific information on agriculture and development. With this first application, the project team hoped to create a model for quickly and easily developing further web applications and an integration infrastructure that encourages interoperability among FAO systems and information sources.

Katz and Chelsom described the solution as one built on the legacy applications ("solutions that work" as they reminded the audience) using web services. The architects also chose XML as the common vernacular to cut across the various metadata and spoken languages in FAO. They called their solution an information bus that wrapped the various applications in web services and connected them with common XML vocabularies.

The information bus uses a Universal Discovery, Description, and Integration (UDDI) registry to identify internal FAO resources, with a separate UDDI registry for external resources. The solution also specifies SOAP for messaging. The information bus uses metadata based in ISO standards such as ISO 3166 for country identification, ISO 639-1 for languages, and ISO 4217 for currencies to define a common language among the legacy applications.

The solution manages metadata with XML with metadata vocabularies and ontologies stored in a repository. The repository uses RDF to specify the properties for the resources described by the metadata. The developers have as an option the use of RDF Schema and Topic Maps to define the ontologies.

Business Integration

The complex internal environment faced by FAO gave a demonstration of XML's ability to handle complexity, a condition often arising in business-to-business data exchange scenarios. While the XML 2002 conference had few business-to-business case studies, the event had several sessions, both general and track, that showed how XML can address the complexities of communicating between organizations.

The conference had several sessions discussing the Universal Business Language (UBL), which builds on the ebXML core components work and the XML Common Business Library and has in the pipeline a component library, a set of standard XML documents, and an extension methodology. The Accredited Standards Committee (ASC) X12, also building on the ebXML platform, presented a different approach to semantic interoperability. Mike Rawlins and Lisa Shreve talked about the organization's reference model for XML design that provides a framework for electronic documents. ASC X12 is the group accredited by ANSI for electronic business message standards, which for most of its history meant EDI..

The X12 reference model offers a framework for seven levels of granularity in business messages from the complete electronic document instance at the top to single atomic pieces of data at the bottom, called primitives. The reference model's architecture assembles the interchangeable parts into messages, giving the details about parties, resources, events, and locations (the Who, What, When, and Where) that businesses need to provide specified goods or services. Rawlins and Shreve said this approach can accommodate the needs of different industries, yet offer a migration path for the large installed base of EDI.

One of the more common business data exchanges involves interactions for financial reporting, such as between companies and their auditors or even the publication of press releases that can have financial implications. On the conference's last day, Walter Hamscher of XBRL International discussed the use of XLink in the Extensible Business Reporting Language (XBRL), a vocabulary designed originally for the transmission of standard accounting reports, but extended to cover other business reporting functions.

Hamscher said XBRL is a set of five basic schemas but needs to address business reporting under a wide variety of legal, accounting, and national contexts. Hamscher used the analogy of the basic facts, surrounded by these varying contexts, an idea similar to ebXML core components underpinning UBL and the X12 XML reference model.

XBRL uses sets of taxonomies to represent the various contexts and supplement the basic schema. It needs a way of addressing these various contexts to provide flexibility, while still maintaining the basic documents. It chose XLink. XLink allows the insertion of elements in XML documents to create and describe links between resources.

XBRL uses these XML links (which are richer than the familiar hyperlinks in HTML) to represent the various relationships between basic XBRL metadata and the different contexts. XBRL then stores these links in a separate database, cross-referenced in a series of tables. This use of references and tables allows for extensibility to cover even arcane or local variations (Hamscher cited unusual lease laws in Hong Kong as an example), but without disturbing the underlying the basic schema.

While the technology recession may not be kind to XML at the moment, other indicators point to better times ahead. During the same week as XML 2002, Intel Corporation reported transacting some $5 billion US worth of business using RosettaNet messages, representing about 10 percent of its supplier purchases (see the report on DISA newswire). Those kinds of numbers could bring some of that exuberance back to XML conferences.