XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

RDF Applications with Prolog
by Bijan Parsia | Pages: 1, 2, 3, 4, 5, 6

To build the HTML page, I need to write a grammar describing the structure of that page (actually, that class of pages; the grammar is very much like a template). For our RSS->HTML renderer, the first step is to come up with a symbol which represents the page as a whole (the "start" non-terminal), in this case I'll use rss_html_list. Any rss_html_list is going to consist of a page of HTML, obviously, and html_write gives me a grammar rule for generating HTML pages, page/2. So, my template's basic form looks like:

rss_html_list -->
       page(...,
	    ...).

The "-->" is the DCG operator; it works somewhat analogously to the ":-" operator, except that the body of the rule is expecting DCG predicates (i.e., ones defined with "-->") instead of regular Prolog predicates. In the normal case. "-->" can be read as "expands to" or "consists of". The first for page/2 specifies the HTML <head> content and the second the <body> content. If I were constructing a full grammar for rss_html_list, I'd expect to find a rule in it with page as its head. This rule is already defined by html_write, so I don't have to do it myself. The arguments allow me to customize the page production rules without modifying any of its actual clauses. Instead, I pass it little chunks of grammar (encapsulated as DCGs) which page tucks into the right places in its definition:

rss_html_list -->
	      page(\rss_list_head,
                    \rss_list_body).

(The leading slash is a convention of the html_write library that indicates that the rest of the atom is the name of a grammar rule.)

The "head" of the final page needs a title element. To create HTML elements, html_write supplies a DCG predicate for the case, html/1 which takes a list of element "specifications" (the leading slash convention is one kind of specification that html/1 understands). The specification for a simple element with just textual content is of the form element_name(content).

rss_list_head -->
          html([title('XMLhack')]).

It's a little annoying to have the title hardcoded like that, so I'll add an argument to the predicate:

rss_list_head (PageTitle)-->
        html([title(PageTitle)]).

This will do for the head. I want the body of the page to have a header, centered, and then an unordered list of the RSS 1.0 items. Since I want to use the title of the page as the header of the body as well, I'll give rss_list_body an argument too.

rss_list_body (Header)-->
  html([h2([align=center],[Header]),
  %Note that the "h2" spec takes *two* lists as args,
  %the first being the attributes, which can be spec'ed
  %either as name=value, or name(value).
  ul(\rss_list_items(Items))] ).

Both the head and body clauses need to get a value passed to them. In this case, I'm going to pass the buck up to the caller of rss_html_list:

rss_html_list(ChannelTitle) -->
	      page(\rss_list_head(ChannelTitle),
                    \rss_list_body(ChannelTitle)).

That's fine for the page head, but the body clause still has that mysterious variable Items. To fill in that hole I need to make a query, and I intend to use rdf/3, a normal Prolog predicate, not a DCG rule, to make it. Thus, some sort of escaping device needs to be employed. For DCGs, stuff inside curly brackets {} is not treated as DCG clauses (and not expanded in the usual way -- see any Prolog text for details):

rss_list_body (Header)-->
  {...},
  %A query goes in there! Some predicates which bind Items.

  html([h2([align=center],[Header]),
  ul(\rss_list_items(Items))] ).

I expect that the query which binds Items will bind a list of items, so rss_list_items needs to handle it. A simple recursive rule will take care of that:

rss_list_items([First_item|Rest_of_items]) -->
  html([li([\list_item_content(First_item)])]),
  rss_list_items(Rest_of_items).

  rss_list_items([]) --> [].
  %The base case: If the list is empty, just return an empty list.

The final two production rules are simple, although both involve queries:

list_item_content(Item) -->
  {...},? %A query that fetches the Item's Description.
  html([\item_link(Item), br([]), Description]).
item_link(Item) --> {...}, %A query that fetches the item's Link and Title. html(i(a(href(Link),Title))).

That's it the template aspect of the grammar.

Back to the Queries

The grammar needs three queries to be complete. The content and link queries are quite straightforward. I only expect one result in each case, so simple calls to rdf/3 will do the job:

list_item_content(Item) -->
  {rdf(Item, dc:description, literal(Description))},
  html([\item_link(Item), br([]), Description]).

  item_link(Item) -->
  %This is a conjunctive query, though each conjunct is independent.

  {rdf(Item,rss:link,literal(Link)),
   rdf(Item,rss:title,literal(Title))},
  html(i(a(href(Link),Title))).

These queries are very simple. No need to walk up or down a tree. No need to think of how the information is encoded. Items have dc:descriptions, rss:links, and rss:titles -- to find out the description, link, and title for an item, we just ask.

Alas, this isn't entirely the case, as I'm sticking pretty close to the bare RDF metal here, and even to the particular representation of RDF given in rdf_db. I could encapsulate the ways of determining the title and links of an RSS item in more pleasing predicates, which would also allow us to change the way they were determined without affecting our template.

rss_title(Item, Title) :- 
    rdf(Item,rss:title,literal(Title)).
rss_link(Item, Link) :-
    rdf(Item,rss:link,literal(Link)).

The query for rss_list_body is more complex. The basic form is clear: I want the rss:items, so I ask for rdf(Item, rdf:type, rss:item). But this only gives me the first rss:item found. I need to say, "Give me, in a list, all the values that satisfy this query". There are several predicates that have this or similar meaning. In this case, I'll use set_of/3, which has the additional virtue of eliminating duplicates:

set_of(Item, %The variable which gets bound to the desired value. 
       rdf(Item, rdf:type, rss:item), %The query.
       Items) %The variable that gets bound to a list of the results.

This, when popped into rss_list_body, completes the transforming grammar.

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow