Business Maps: Topic Maps Go B2B
August 21, 2002
Interoperability between ontologies is a big, if not the single biggest issue in B2B data exchange. For the foreseeable future there will not be a single, widely accepted B2B vocabulary. Therefore we will need mappings between different ontologies. Since these mappings are inherently situational, and the context is very complex, we cannot expect computers to create more than a small part of those mappings. We need tools to leverage the intelligence of humans business experts. We need portable, reusable, and standardized mappings. Topic Maps are an excellent vehicle to provide those "Business Maps". (This article presumes a basic understanding of Topic Maps, readers may wish to read A Gentle Introduction to Topic Maps in conjunction with this article.)
We have lots of data and descriptions of data. Take for instance the abundance of vocabularies for B2B exchange: xCBL, FinXML, FpML, etc. Those vocabularies can be seen as ontologies. Older EDI technologies such as X.12 and EDIFACT are also ontologies. There are as of yet no general standards for B2B vocabularies in XML. The ebXML initiative did not have actual business documents as one of its deliverables. Right now work is being done on the Universal Business Language (UBL) to fill this gap. Beside those "industry-strength" solutions, there are lots of tailor-made data exchanges between companies, often using nothing more than simple ASCII comma-separated files. Together with their documentation, those ASCII-files also constitute ontologies. And even within larger companies many different ontologies exist within the different legacy databases of different departments. Those different data sources present huge interoperability problems.
One of those interoperability problems is finding out which data items from different sources are the same. To do that, we need to compare the meanings of those data items. This means we have to look up data definitions for different data sources and compare those data definitions. Comparing human-made definitions is a tough job. Different organizations may come up with very different definitions for things that really are the same, and with very similar definitions for things that are very different in reality.
First of all, hard as we try, mistakes and obscurities occur in our data definitions. Second, in making data definitions we may find that a lot of data aren't that well defined to start with. In other words, when we make data definitions for a data source, it's sometimes the first attempt to define the data at all, and when there already is a definition, it is often not precise enough. Third, when we make a definition like "an employee is a person working at a company", we introduce many new words ("person", "work", "company") from natural language. When meanings in natural language aren't precise, those definitions aren't going to be precise either. We should stop thinking we can fix meanings once and for all in any but the most limited contexts.
Some Solutions and Why They Don't Work
There are some solutions to these problems. The first, which I shall call "the naive approach" is to make a new vocabulary which covers everything, and then let everybody use that vocabulary. It's an easy solution to think of, but it does not work in practice. Multiple vocabularies are a fact of life. Think only of the huge number of existing applications using legacy formats, which won't simply go away. And even for new applications, there are so many different business needs in different contexts that there's a huge drive toward specific, directly applicable vocabularies and away from generic standards which take a long time to evolve. So the main problem should be how to make the different vocabularies interoperate, not how to replace them by a single unifying standard. Developing unifying vocabularies is a good thing; the more success they have, the better. But one should think of them as central pieces in the plethora of vocabularies, not the only ones. The success of new, unifying vocabularies will depend not so much on their inherent capabilities as on their ability to interoperate with existing vocabularies. Interoperability is the shortest route to acceptance.
Another approach to interoperability is the use of Published Subject Indicators (PSIs) as used in Topic Maps. The basic idea is to make public libraries of unique IDs for things. In our vocabularies we incorporate PSIs, and then we can compare the terms in our vocabularies. In an informal example:
Topic: "Last Name"; PSI: familyname Topic: "Achternaam"; PSI: familyname
The PSIs in the English and Dutch topics allow us to conclude that both topics are the same. Note that this really just shifts the problem from vocabularies to public libraries. In general we can say this approach is successful if the problem space consists of clearly delimited entities and there is a widely accepted canonical public library. Examples of areas were this approach will work are for instance ISO currency and country codes. Currently OASIS is working on standards for PSIs in its OASIS Topic Maps Published Subjects TC. Once this work is done, the situation might improve as more PSIs are being published.
In actual mappings between ontologies, we often do not really establish semantic equivalence in a true sense as needed in PSIs. Consider an example. When GigaSellers decides to let Print & Send handle its invoices, invoice information flows from GigaSellers to Print & Send. When we have found we can use GigaSellers "CustomerAddress" as the "invoice_address" in Print & Send's invoicing application, we stop. We do not need to find out whether they are truly equivalent in all circumstances. There is no direct business need to find this out, and therefore the boss doesn't pay for it.
Solutions like PSIs do not work here because PSIs require true semantic equivalence. The interesting observation is that most real world mappings are unidirectional: we translate from a source ontology to a destination ontology for a specific business process. For instance, an order goes from buyer to supplier. It does not go back (though a different document such as an invoice or order confirmation might go back). So for an order only a translation from the buyer's ontology to the supplier's ontology is needed. This unidirectional nature of business exchange means that usually we do not establish equivalence relationships, but subset relationships between ontologies. In the above example, "CustomerAddress" is a subset of "invoice_address". All instances of "CustomerAddress" constitute a valid instance of "invoice_address". We do not know whether the reverse is true. It could very well be the case that GigaSellers requires its "CustomerAddress" to be a physical address where goods can actually be delivered, and Print & Send allows postal boxes as "invoice_address". Further, there is often no true equivalence because the data items are in different formats. GigaSellers may store all its dates in CCYY-MM-DD format, and Print & Send as MM/DD/CCYY. In this case, too, there is no true equivalence, since a transformation is needed.
It might be tempting to conclude that we simply have to make a mapping between every two ontologies we use. That, however, is going to far. Even when we do not always establish true semantic equivalence relationships, the mappings we make can be reusable. What we need to do is capture knowledge about the mapping process itself. We need to store the fact that we can use "CustomerAddress" as "invoice_address" in this particular context. Then, when someone else needs to find out whether "CustomerAddress" can be used as "mailing_address" in a different context, they can use this information. When we store this kind of information, we could facilitate the process of mapping ontologies through the use of semi-automated tools which show existing mappings for items in our ontology that we need to map onto another ontology. The human expert making the mapping can still make all the relevant choices and provide new mappings where existing ones can't be reused. Such semi-automated tools could then generate a new mapping, which also can be stored to provide information for the next one. It would also become much easier to exchange information about mappings without having to provide full one-on-one equivalence relationships.
The Knowledge in Mappings
The model above shows the knowledge that exists in ontology mappings. First we must distinguish the documents which are exchanged in B2B exchanges and the items which make up those documents. Items can belong to domains (data types) which in turn can be specializations of other domains (domains are represented here as items which generalize other items). In the left-hand classes we would store the information on the business documents in use by relevant business partners. There is no need to store the full definitions of the data; it is sufficient to store identifiers that uniquely identify the documents, items therein, and domains in use. After all, the full definitions are probably already stored somewhere; simply copying them would only introduce unnecessary redundancy.
On the right-hand side of the model we have the actual mappings. First, the fact that some (usually two) documents are mapped onto each other is stored in the class document-mapping. The document-mapping is related to one or more item-mappings. The item-mappings store not only the identifiers of the mapped items, but also the kind of mapping: is this an equivalence relationship, which is potentially bidirectional? Or is it a unidirectional mapping, i.e., is it a subset-superset relationship?
We can also store conversions. For instance, if the destination document allows last names of only 25 characters, but the source allows last names of indefinite length, we could specify that names need to be truncated to 25 characters. We could also store more complex transformations. They should, however, be readable for ordinary humans, which would rule out XSLT "as is". The intended users of a tool based on this model are business analysts, not XML programmers. (An intelligent tool could of course store XSLT for a large class of transformations and show the results in natural language.) Last, we can store domain conversions, so we wouldn't have to store the same MM/DD/CCYY to CCYY-MM-DD conversion for every date in the document.
The central class is context. This is context in the broadest sense; this class could store information on B2B vocabulary, region, country, company, business unit, timeframe and whatever is necessary for the mapping under scrutiny. Context would apply to all other classes. If unspecified, an item mapping would usually inherit its context from the document mapping in which it is contained. Context also applies to the left-hand side, though maybe we would only want to store the vocabulary involved here. Of course, the work done on Context Drivers of ebXML would constitute a good starting point for defining context.
The data model maps quite nicely onto core Topic Map constructs, to yield what I will call "Business Maps".
|Mapping Model||Example||Topic Map construct|
|document composition||CustomerName is part of Invoice||association|
|document-to-document mapping||Invoice maps to invoice_document||association|
|item-to-item mapping||CustomerName maps to company_name||association|
|context||Vocabulary, Company, Region, Industry, ...||scope|
|external document description||http://www.bizwords.org/invoice||occurrence (type: business document description)|
|external item definition||http://www.bizwords.org/amount||occurrence (type: definition)|
|external item datatype||http://www.bizwords.org/date||occurrence (type: datatype)|
|external item example||http://www.bizwords.org/amount/example||occurrence (type: example)|
To see how this works consider the following fragment of code, taking from the Topic Map definition of "Bizwords!", a B2B vocabulary:
<topic id="name"> <instanceOf> <topicRef xlink:href="itm.xtm#item"/> </instanceOf> <baseName> <scope> <subjectIndicatorRef xlink:href="http://www.bizwords.com"/> </scope> <baseNameString>CustomerName</baseNameString> </baseName> <occurrence> <instanceOf> <topicRef xlink:href="itm.xtm#definition"/> </instanceOf> <resourceRef xlink:href= "http://www.bizwords.com/definitions#CustomerName"/> </occurrence> </topic>
Here a single data item is defined. It is an instance of topic type "item" (which is defined elsewhere). The topic has "CustomerName" as a name. This name is scoped by the B2B vocabulary to avoid name-based merging with topics from other B2B vocabularies which might conceivably have the same name. The topic would merge with other topics which have the name "CustomerName" within scope "http://www.bizwords.com", which is desirable since these topics would represent the same data item in the same vocabulary. The topic also has an occurrence which is a definition of "CustomerName" somewhere on the www.bizwords.com site. Note that the actual definition of "CustomerName" does not have to be repeated here. Topic Maps are indexes which only link to external data sources.
Besides topics representing data items, the Topic Map of Bizwords! would also contain definitions of business documents and their composition. A code fragment:
<association> <instanceOf> <topicRef xlink:href="itm.xtm#composition"/> </instanceOf> <member> <roleSpec> <topicRef xlink:href="itm.xtm#containing-document"/> </roleSpec> <topicRef xlink:href="#invoice"/> </member> <member> <roleSpec> <topicRef xlink:href="itm.xtm#required-item"/> </roleSpec> <topicRef xlink:href="#name"/> </member> ... members omitted ... </association>
This is an association of type "composition" (defined elsewhere) which says that the topic "invoice" (not shown) plays a role "containing-document", and the topic "name" (which was discussed above) plays the role of "required-item" in this document. The association effectively describes a business document composition. Other members of the association are omitted for brevity, but would include at least "CustomerAddress" and "TotalAmount" in the GigaSellers example.
Now suppose GigaSellers uses the Bizwords! B2B vocabulary, but Print & Send uses a competitor, the Galactic Business Language (GBL). GigaSellers and Print & Send could then make a Business Map using the above defined data items. For example:
<association> <instanceOf> <topicRef xlink:href="itm.xtm#unidirectional_mapping"/> </instanceOf> <scope> <topicRef xlink:href="context.xtm#gigasellers"/> <topicRef xlink:href="context.xtm#sales"/> <topicRef xlink:href="context.xtm#europe"/> </scope> <member> <roleSpec> <topicRef xlink:href="itm.xtm#source_item"/> </roleSpec> <topicRef xlink:href="bizwords.xtm#name"/> </member> <member> <roleSpec> <topicRef xlink:href="itm.xtm#destination_item"/> </roleSpec> <topicRef xlink:href="gbl.xtm#name"/> </member> </association>
This association is of association type "unidirectional_mapping". It links two data items in scope "GigaSellers, Sales, Europe" (The scope is restricted to the GigaSellers point of view for simplicity here.). This says this unidirectional mapping is found to be valid in the GigaSellers European Sales department. Of course the Business Map will contain many more such associations. It would also be possible to have associations of other types than "unidirectional_mapping". We could have a derived association type which indicates a transformation, for instance of date format MM/DD/CCYY to CCYY-MM-DD. We could also have bidirectional mappings. So Business Maps can be a very powerful tool to capture the knowledge in mappings for later reuse.
What's more, Topic Maps offer the facility to merge two distinct Topic Maps. This is an excellent way to compare separate, portable B2B-mappings. When we have two Business Maps, say one from "Sales Europe" and one from "Sales Asia", and we want to make a new Map for "Sales America" or "Marketing Europe", we can merge the existing maps. Any references to external message items will be merged. Once the Business Maps are merged, we can use the scope to filter the business processes we are interested in. Business analysts will have quick access to all relevant mapping information from prior mappings in related areas. So Business Maps can provide an easy and flexible way to reuse knowledge stored in mappings: they are portable, reusable mappings come true.
Quite a few things need to happen before this vision is realized. The best thing of course would be an accepted standard for Business Maps. Having this, we would be able to exchange mappings with all companies using this standard. Note that this a far less ambitious and more tenable goal than establishing a single unifying B2B ontology. This approach could also prove to be a viable way to achieve intra-company interoperability, still a big problem in the world of ever-merging large companies. We would need tools to support the Business Maps -- querying, filtering, importing and exporting, creating and editing them. And we would need a description of the properties of applications processing Business Maps, especially on context and scope. The Topic Map standards do not say a lot about what scope means, and the current notion of scope does not support what Business Maps would need. Scope is without doubt the most hotly debated issue in the Topic Map community at the moment, and there are several proposals to extend scope. All in all, Business Maps could provide for a huge facilitation of human-mediated ontology mapping.
Interoperability between ontologies is one of the most important problems in B2B data exchange. For the time being, making mappings will mainly be a human job. Therefore we need a way to leverage human intelligence to make all the required B2B mappings. Portable, reusable mappings would accomplish this. Those mappings would need to store information on business document mappings and the context that applies to those mappings. Topic Maps are an excellent vehicle to store such information, thus yielding Business Maps.
The complete samples of Business Maps are available at http://www.marcdegraauw.com/itm/
The original ISO Topic Map standard: ISO/IEC 13250 Topic Maps
The XTM standard: XML Topic Maps (XTM) 1.0
The XTM standard contains a good introduction on Topic Maps: 2.1 A Gentle Introduction to Topic Maps
Alan Kotok on the current status of interoperability initiatives: Interoperability Summit: Good Intentions, Little Action