Cataloging XML Vocabularies
by Eric van der Vlist
|
Pages: 1, 2
XML Vocabulary Directory
Initializing a directory with the information gathered, and the statistics mentioned above, would already be valuable. A simple XSLT transformation can present the information known about a namespace as a RDDL document, readable both by humans and computer agents.
The first stage, using the data collected by the first version of the crawler, is pretty straightforward. Without any manual additions, our system would be able to present the namespace URI, its statistics, and a list of resources using the namespace.
In its most simple version, the table of contents for a document might look like the following:

With the statistics and usage showing the information retrieved on the Web:


If we had run this crawl several times, trends could be given as well, which would be useful for evaluating the dynamics of the namespace. For the moment, our first step to improve the document can be simply to include the comments made above on the statistics:

Next, there are two main directions in which the description of the namespace can be improved: adding more related resources (such as schemas, existing RDDL documents, or stylesheets) and adding more textual information.
Part of these additions could be found by an improved version of the crawler that does a specific analysis of well-known document types such as the different schema flavors, XSLT transformations, or RDDL documents.
Another encouraging factor is that the number of namespaces is several orders of magnitude smaller than the number of pages on the Web and that the amount of work is in no way comparable to what has been done by, for example, DMoz and its 3,274,639 sites and 47,324 editors.
So, with a minimal amount of research and editing, our table of contents becomes:

With the addition of a simple "description" section:

And a short list of resources:

Our document contains now most of what an XLink newbie needs to start working on the subject. What can we add? How about news? That's trivial assuming we can find a syndication channel such as the one available on XMLHack.
This gives us a new section for our document, with the latest news:

We have now a single point of entry giving a huge amount of information on XLink, building on what we've found on the Web, existing resources such as XMLhack, and a minimal amount of human intervention to glue it together.
Search
|
Related Reading
XML Schema |
The search is the part of this project for which I haven't any concrete material to show; however, my experience on XMLfr is that a standard search engine on a specialized technical site gives pretty good results (and this should be the case here again), especially when it is augmented by a Topic Map which classifies the resources available and aids navigation among the different topics.
In the XML domain, such a site could use the work done by the OASIS XMLvoc (Vocabulary for XML Standards and Technologies) Technical Committee, the goal of which is to "define a vocabulary for the domain of XML standards and technologies, which will provide a reference set of topics, topic types, and association types that will enable common access layers and thus improved findability for all types of information relating to XML, related standards, and the XML community. The vocabulary items will be defined as Published Subjects, following the recommendations of the OASIS Topic Maps Published Subjects Technical Committee".
Conclusion
Both the technology and the information are available to fix the current shortage of relevant and coherent information about XML vocabularies used on the Web. I believe that this prototype and the ideas behind it may be a foundation for a very useful site and observation platform for the use of XML. I welcome contact from interested parties.
