XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RDF Applications with Prolog
by Bijan Parsia | Pages: 1, 2, 3, 4, 5, 6

A Transformation Example

The XML serialization of RDF (known as RDF/XML) is both a boon and a bane. Part of the bane is common to many XML-based languages: it's just plain ol' nasty to read and write. Part of the boon is, of course, that there is an ever-growing pool of tools and techniques for dealing with XML-based languages. One of the foundational techniques for dealing with XML is the structural transformation of XML documents. The standard (if not the best) tool is XSLT. Thanks to RDF/XML, one can manage RDF documents/knowledge bases using XSLT stylesheets.

Prolog, as a symbolic processing language, is perfectly capable of doing straightforward tree transformations, and this a perfectly reasonable way to use it. However, it's not quite in the spirit of RDF. Structural transformation is syntactically focused (albeit on abstract syntax), whereas RDF is supposed to be the foundation of the "Semantic" Web. Given such, one might expect it to take an inference-based approach. With structural transformation, one concentrates on the shape of the tree. With inference based "transformation", one concentrates on the implications of the knowledge base.

What does this mean in practice? When I sit down to write an XSLT script to generate HTML from an RSS 1.0 document, there are two basic sorts of where-ish questions I ask: "Where do I find the elements I want in the source tree?" and "Where do they go in the result tree?" Answers to the first sort of question are typically going to be XPath queries, whereas answers to the second will be templates, element constructors, and the like. Though in an XSLT script, querying and result building are intertwined, clearly they are conceptually distinct aspects of the transform. But in each case, I'm still walking the trees. My script is sensitive to the position of the elements (and attributes, and...) in the tree, and not to their meaning.

Contrariwise, when writing a Prolog script to generate HTML from an RSS 1.0 document, I don't care about how the elements were arranged -- what was which's parent, and so on -- but rather on what the document expresses, and what conclusions I could draw from it. At least that's true for the querying part: since HTML isn't itself a knowledge-oriented system, a structural approach to its generation makes perfect sense. It would be different if I were "transforming" one knowledge base into another. In that case, I would conceive the problem in terms of how the second knowledge base should learn from the first. To help us with the problem of creating a structured result, Prolog (and SWI-Prolog in particular) has a wonderful facility for generating result trees, the Definite Clause Grammar (DCG). My use of DCGs to generate HTML is precisely analogous to using XSLT templates, with Prolog predicates serving the role of XPath expressions.

Definite Clause Grammars

Generally, one uses a declarative language when one has a problem that's well described by some formalism, and it is very tedious to generate, by hand, a computer program that will implement such descriptions. For example, the parser generator YACC implements a declarative language for describing context free grammars (CFGs), and it will generate the requisite nasty C code for a parser that will recognize sentences which adhere to those grammars. In essence, one tells YACC "write me a parser that can parse sentences conforming to this grammar". And one does this because it's relatively easy to understand and modify the formalistic description, especially when compared to hacking around with large symbol tables for finite state automata. An even more familiar formalism is that of regular expressions.

The formalism that Prolog implements is extremely expressive, so much so that it can pretty much directly express CFGs. That is, it's straightforward to write a Prolog program that looks very much like a standard BNFesque description of a CFG. Being a Prolog program, it is executable, and will happily parse sentences that conform to the grammar. It can even generate conformant sentences, given the right seeds. This suggests that in order to generate HTML, I imagine what I want the resultant HTML page (think of the page as a "sentence") to look like, write a grammar for that page just as if I were trying to parse some similar page to scrape it for data, and feed that Prolog grammar the bits it needs to write a such a page, only using the data I supply (in this case, via queries of my knowledge base created with the RSS document).

While I could write the grammar in straight Prolog, there are several bits of housekeeping a straight Prolog program would require that will just clutter things up. Thus, most Prolog systems add a bit of syntactic sugar to manage the repetitive stuff, the DCG notation. SWI-Prolog goes further and provides a DCG library for HTML generation, html_write.

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow