Schema Repositories: What's at Stake?
|
Table of Contents |
|
Part One |
No race has been at once as enigmatic and as heated as the race to be Your Source for XML Schemas and DTDs. Since the publication of XML 1.0, the schema-writing public has been graciously invited to deposit its intellectual property on other folks' web sites. (Here and throughout the article I use "schema" in the general sense, including XML Schemas, DTDs, etc.) When the invitations came from out of the blue, there was no incentive to respond; but with the world's largest software company sending out invitations you can't refuse, and the compulsion to offer an alternative now galvanizing support for the Organization for the Advancement of Structured Information Standards (OASIS), it's time to figure out what's at stake.
Here's a theory: It's not the siting and cataloging of schemas that is important; it's the potential relationship to the content of the schemas that is the ultimate prize. If the sole intent was to be a source of schemas and useful information about schemas, then either contending organization might have taken seriously the question of what the schema-seeker needs right now and in the near term. But to date, neither repository seems grounded in real-world, current requirements. The registry and repository sites launched by Microsoft and OASIS (BizTalk and XML.org, respectively) will come into their own only when they can dish out schemas that are part of a comprehensive and cohesive framework. In fact, both sites are being developed in conjunction with such frameworks.
Before talking about what the schema-seeking public really needs, who's calling the shots, and where all this is leading, let's see what the BizTalk and XML.org repository efforts offer and how they compare.
|
BizTalk.org repository |
XML.org repository |
|
|
Control |
Microsoft Corp. |
Sponsor organizations: IBM, Oracle, SAP, Sun Microsystems, CommerceOne, DataChannel, Documentum, SoftQuad |
|
Advisory |
Membership not published, but partial list of 29 organizations, including OAG, the DoD, CommerceOne, and RosettaNet, supplied on request.* |
A project of OASIS, so it is possible to say that OASIS, with 150 members, is an "advisory" to XML.org. |
|
Policy on Receipt of Material |
Use XDR with BizTalk wrapper tags; statement on site says will support W3C schemas when done. |
Use standard schema language (W3C or ISO DTDs until W3C schema language done.) |
|
Services |
Discovery and hosting (search for and retrieve actual schemas and supporting documents from repository) |
Current: catalog of links to sites developing schemas Future: discovery and hosting; referral of queries to alternate repositories |
|
Schemas listed |
Hosts 212-250 schemas searchable in 11 industry categories from more than 50 organizations. |
Links to over 100 schema-producing organizations listed in 45 categories. |
|
Supporting material |
Sample document instance and documentation for each schema; documentation according to template |
Future: DTD/schema and supporting files |
|
Interface |
Keyword search or select industry and organization |
Browse list organized by industry and organization |
|
Descriptive documents |
BizTalk Framework 1.0 Independent Document Specification (applies to business schema language more than repository). Linked from BizTalk.org. |
OASIS R&R Technical Committee
History |
* The list is not posted on BizTalk's site, but is "public." In response to a query, Chris Kurt supplied this list: "American Petroleum Institute, Ariba, Baan, Boeing, Clarus, CommerceOne, Compaq, Concur, Dell, DISA, EXE, Extricity, Ford, GEIS, Harbinger, i2, Intelysis, JDEdwards, US DOD, Merrill Lynch, Neon, Open Applications Group, Pivotal, Reuters, RosettaNet, SAP, Siebel, UPS, webMethods, and others."
The BizTalk Repository
It's hard to tell exactly how many schemas are on the BizTalk server as there is no browse interface (so selection is either by keyword or by industry and organization). A search on "unclassified" renders 212 hits, and press releases claim over 250 on the site. If you want a purchase order schema, you can search on those keywords and get 19 hits of various types. If you want a manufacturing purchase order schema, you can look at organizations listed as schema providers under "manufacturing," but you can't search by keyword within that category. As the number of listed schemas proliferates, this interface will need some work.
Presentations on BizTalkthere were three at XML '99canonically reiterate its three components:
- the repository
- the BizTalk schemas
- the BizTalk Server, which is a commercial Microsoft product
The BizTalk Framework document deals almost exclusively with the BizTalk schema tags, and the rambling, undated, "BizTalk Philosophy" does not define repository requirements. An overview of the framework says:
The BizTalk Framework Web site will be an interactive place where industry groups and developers can publish their schemas. The Web site will allow public and private publication based on the decision of the publishing organization. Once a BizTalk Framework schema is accepted and published, the repository will provide versioning and specialization support for BizTalk Framework schema adoption and alteration. The repository will support dynamic detection of schemas, processes and visualization maps connected to any given version of a BizTalk Framework schema.
A press release on December 15, 1999, gives some indication of how the repository is to be viewed:
One hundred and fifty organizations are now registered as schema publishers on www.biztalk.org. "We're far and above in the lead," says Dan Rogers, Program Manager of www.biztalk.org. "The difference between our library and others is the richness and correctness of the content." No other schema library or so-called "repository" validates the technical correctness of schemas.
The PR is consistent with presentations made in Philadelphia, which indicated that Microsoft sees this as something of a horse race based on numbers of schemas and quality of supporting documents and services. The release states, "Another important feature to look for in a schema library is run-time hosting.... Hosting allows an application that is using a schema to access the schema over the Internet at any time." I did not find any further indication of exactly what this means or how it is implemented on the current site. Advisory committee members are already privy to the draft 2.0 spec for the Framework.
XML.org
The XML.org repository is the work product of OASIS, officially an "initiative" of the consortium, which has grown from a few dozen to over 150 members, with eight sponsors putting up half a million dollars total to jumpstart the repository. (Four partner sponsors have paid a $100,000 entry tab and four affiliate sponsors have paid $25,000 to get the site up and running.) The project will adopt the specifications developed by the OASIS Registry and Repository Technical Committee, chaired by Terry Allen of CommerceOne. The actual site is under the direction of Craig Chevrier, recently hired as XML.org managing editor.
The current site catalogs and links to schema-writing organizations, from the American Institute of CPAs to the Workflow Management Coalition. This catalog is the precursor to the actual registry. By April, according to Chevrier, they hope to be opening their doors to deposit of actual schemas, as BizTalk.org does now.
The XML.org catalog has a browse interface, which, in the absence of a robust taxonomy or classification scheme, gives a better overview than the limited search and classification system of BizTalk, but again, won't scale up to thousands of entries. Chevrier says they have not yet decided on an interface, but the goal is to allow querying by keyword, application type, and industry. Browsing will be maintained as long as it stays manageable, but may become less and less viable as the volume increases. The site itself is undergoing a major revision that should be up by early February.
The documents describing the OASIS Registry posted by Allen's technical committee apply generally to XML schema repositories. According to Rogers, the BizTalk site will use the OASIS specification when it is complete, the idea being that interoperability between repositories will allow a query to be passed to an alternate source.
"We're tracking the progress of that work, and will make any change to our software that we feel is appropriate once the specification reaches a mature state and other schema libraries start implementing it and need to interoperate. We're working on defining automation interfaces for this purpose as well."
Since the Microsoft specifications are not public, the Registry TC documents apply generally to both sites.
Are Repositories Useful?
Here are the use cases projected for the repositories (explicitly for XML.org and implicitly for BizTalk.org) and summaries of why, on closer examination, I think the area of application may be significantly narrower in the near term:
Obtain schema (and other required supporting files, such as stylesheet) automatically on receipt of a document referencing an unknown schema.
Counter: If you don't know the information model of the schema, or if it has changed, retrieving the schema won't automate interoperability. If you do know the information model underlying the schema, and you are using it on a real-time, transactional basis, you will likely download it once and maintain it locally, rather than hitting the repository server every time you need to parse an instance. If you aren't convinced by this argument, search for "purchase order" on BizTalk.org. The 19 hits (as of 1/20/00) include general and specialized documents, complementary and contradictory approaches, and pieces of larger schema frameworks. If you know ahead of time which one you want, finding it here might be convenient. If you don't know, this selection would represent the beginning of your research, not its conclusion.
Upload schema and supporting files, thus taking burden of being a schema server off of the creator. Files may be available for archival access (slow retrieval) or utility access, where server would require high speed and possibly high bandwidth. Posting material can also solicit useful feedback.
Counter: The arguments against utility usage seem the same as above: if you use it frequently, you will fetch your own copy once. If an update is made, you will fetch the new schema once. This process can be automated as long as the revision does not affect the relationship to your local information model and how instances are processed locally. But how will the updated processing system know that a change in datatype means a commensurate change in local processing? It seems difficult to automate this level of discretion without a set of ground rules on the range of changes possible within an "update."
Register without deposit to gain visibility, but maintain local control from original site or alternate repository.
Counter: None. This is the library or catalog function of the repository, consonant with the archival search and browse functions. This seems quite reasonable and immediately useful.
Browse or search for schema for new editing application. End user may not even be aware of use of XML or invocation of remote schema. (Example, I'm listing my house with a real estate broker, but don't have the right schema. My editing application hits a repository, finds and downloads the correct schema, and customizes itself for my data input.)
Counter: At XML '99, all three vendors showing XML document editing tools promoted easily customizable, schema-specific applications (Arbortext's Adept Lite, SoftQuad's XMetaL, Excosoft's Documentor). Adapting any of these to a schema today is neither an automated nor an end-user process. The vendors have done much to lower the bar, but the task still requires integration and programming. Fully automating the process will require a major chunk of work in the implementation and execution of editing tools and interfaces. I'd really love to see this, and I applaud the writers of the OASIS Use Scenarios for looking aheadit's a refreshing change from looking backward at the word processing paradigmbut I don't think this is a use case applicable to repositories in 2000 or 2001.
In summary then, the repository use cases that are compelling, at least for the near-term, are the discover-what's-out-there, look-at-it, and evaluate-it yellow pages scenarios.
What We Really Need in a Repository
There are essentially three levels of utility a repository could provide:
- A yellow-pages-like listing of anyone willing to pay the price of admission and conform to minimal constraints
- A reference-librarian or encyclopedia-like resource that informs and guides users to the information they really need
- A dynamic, real-time source for schema location during transactional processing
Both XML.org and BizTalk could grow into a yellow pages for schemas. However, to guide users to the right schemathat is, to be a reference library rather than a phone bookthe sites will need to put some more muscle and moxie into the project and produce more than just a flat list of everything that comes their way.
Currently, BizTalk touts its validation service, but if this is anything above a validating XML parser, it is not obvious. BizTalk also ranks schemas by what they call "use counts." This number is the number of individuals registered with BizTalk who ask to be notified if a change is made to a schema. Perhaps it indicates something, but moving from "tell me if Foo.xdr gets tweaked" to "Foo.xdr is mission-critical to my business" is an unwarranted leap of faith. XML.org proposes a similar metric: tracking the number of downloads. But this won't work either. Here's why:
Let's say I post Lioras.RadiologyExam.DTD on the XML.org site, and a dozen integratorsfrom the Mayo Clinic to King Faisal Hospitaldownload the thing to see what the heck I've done. The indicators would be "high usage." Meanwhile, the American College of Radiology has ACR.RadiologyExam.DTD, which is listed for reference on both sites. But everyone in the field has been tracking the development of the document, is a member of ACR, and gets their copy directly from the ACR site. Result: "low usage."
In short, measuring status or usage hasn't gotten more than a lick and a promise from either site. Users need to know what is really standards conformant. They need to find out what is used by whom; what experience others have had working with the schema; and its relationship to other schemas. If not a critical edition, at least we need an Amazon.com-like source of user feedback and a NY Times best seller list version of popularity.
Dan Rogers of BizTalk indicated that Microsoft had no plans to qualify or make judgements on schemas, and that usage indicators would become more representative as traffic to the site and use of schemas rose. Laura Walker, Executive Director of OASIS, on the other hand, suggested that "long term, OASIS will play more of a role in arbitrating the standards and offering opinions on the validity and viability of the standards." She believes that it is too early to add this layer of valuation, that the repository should get started on a "democratic" basis while experience is compiled on the various schemas. According to Walker, "More needs to be done in the process of downloading and testing, then using and applying the schemas. 12-18 months from now, this will change."
For one-stop schema shopping, a repository will not only need to guide a user to the right model, it will need to provide an unambiguous information model documenting its semantics. If I'm going to map my local database to information via that schema, I need to know its information model and its relationship to other models and schemas.
The OASIS Registry TC design principles call for "providing DTDs and schemas, and an interface to their metadata, before proceeding to other matters." BizTalk-hosted schemas have rudimentary documentation on site. Neither BizTalk nor OASIS will necessarily set the context required for "semantic interoperability"the sine qua non of the exchange world.
Semantic interoperability means that when I send you my XML instance, you not only can parse it against a known schema, but you know what the components mean and can relate them to your local information model. To pull a schema off the shelf or down from a repository site and put it to work, the schema has to be a known quantity, part of a known framework of interoperable schemas or one with an unambiguous derivation from a known information model.
While the current sites are clearly intended to rise above the level of a yellow pages, neither has addressed the requirements for qualification or documentation of their wares.
So, what are they aiming at?
Pages: 1, 2 |