RDF Applications with Prolog
by Bijan Parsia
|
Pages: 1, 2, 3, 4, 5, 6
To build the HTML page, I need to write a grammar describing the
structure of that page (actually, that class of pages; the grammar is
very much like a template). For our RSS->HTML renderer, the first
step is to come up with a symbol which represents the page as a whole
(the "start" non-terminal), in this case I'll use
rss_html_list. Any rss_html_list is going to
consist of a page of HTML, obviously, and html_write
gives me a grammar rule for generating HTML pages,
page/2. So, my template's basic form looks like:
rss_html_list --> page(..., ...).
The "-->" is the DCG operator; it works somewhat
analogously to the ":-" operator, except that the body of
the rule is expecting DCG predicates (i.e., ones defined with
"-->") instead of regular Prolog predicates. In the
normal case. "-->" can be read as "expands to" or
"consists of". The first for page/2 specifies the HTML
<head> content and the second the
<body> content. If I were constructing a full
grammar for rss_html_list, I'd expect to find a rule in
it with page as its head. This rule is already defined by
html_write, so I don't have to do it myself. The arguments
allow me to customize the page production rules without
modifying any of its actual clauses. Instead, I pass it little chunks
of grammar (encapsulated as DCGs) which page tucks into
the right places in its definition:
rss_html_list -->
page(\rss_list_head,
\rss_list_body).
(The leading slash is a convention of the html_write library
that indicates that the rest of the atom is the name of a grammar rule.)
The "head" of the final page needs a title element. To create HTML
elements, html_write supplies a DCG predicate for the case,
html/1 which takes a list of element "specifications"
(the leading slash convention is one kind of specification that
html/1 understands). The specification for a simple
element with just textual content is of the form
element_name(content).
rss_list_head -->
html([title('XMLhack')]).
It's a little annoying to have the title hardcoded like that, so I'll add an argument to the predicate:
rss_list_head (PageTitle)-->
html([title(PageTitle)]).
This will do for the head. I want the body of the page to have a
header, centered, and then an unordered list of the RSS 1.0
items. Since I want to use the title of the page as the header of the
body as well, I'll give rss_list_body an argument
too.
rss_list_body (Header)--> html([h2([align=center],[Header]), %Note that the "h2" spec takes *two* lists as args, %the first being the attributes, which can be spec'ed %either as name=value, or name(value). ul(\rss_list_items(Items))] ).
Both the head and body clauses need to get a value passed to
them. In this case, I'm going to pass the buck up to the caller of
rss_html_list:
rss_html_list(ChannelTitle) -->
page(\rss_list_head(ChannelTitle),
\rss_list_body(ChannelTitle)).
That's fine for the page head, but the body clause still has that
mysterious variable Items. To fill in that hole I need to
make a query, and I intend to use rdf/3, a normal Prolog
predicate, not a DCG rule, to make it. Thus, some sort of escaping
device needs to be employed. For DCGs, stuff inside curly brackets
{} is not treated as DCG clauses (and not expanded in the
usual way -- see any Prolog text for details):
rss_list_body (Header)-->
{...},
%A query goes in there! Some predicates which bind Items.
html([h2([align=center],[Header]),
ul(\rss_list_items(Items))] ).
I expect that the query which binds Items will bind a
list of items, so rss_list_items needs to handle it. A
simple recursive rule will take care of that:
rss_list_items([First_item|Rest_of_items]) --> html([li([\list_item_content(First_item)])]), rss_list_items(Rest_of_items). rss_list_items([]) --> []. %The base case: If the list is empty, just return an empty list.
The final two production rules are simple, although both involve queries:
list_item_content(Item) -->
{...},? %A query that fetches the Item's Description.
html([\item_link(Item), br([]), Description]).
item_link(Item) -->
{...}, %A query that fetches the item's Link and Title.
html(i(a(href(Link),Title))).
That's it the template aspect of the grammar.
Back to the Queries
The grammar needs three queries to be complete. The content and
link queries are quite straightforward. I only expect one result in
each case, so simple calls to rdf/3 will do the job:
list_item_content(Item) -->
{rdf(Item, dc:description, literal(Description))},
html([\item_link(Item), br([]), Description]).
item_link(Item) -->
%This is a conjunctive query, though each conjunct is independent.
{rdf(Item,rss:link,literal(Link)),
rdf(Item,rss:title,literal(Title))},
html(i(a(href(Link),Title))).
These queries are very simple. No need to walk up or down a
tree. No need to think of how the information is
encoded. Items have dc:descriptions,
rss:links, and rss:titles -- to find out the
description, link, and title for an item, we just ask.
Alas, this isn't entirely the case, as I'm sticking pretty close to the bare RDF metal here, and even to the particular representation of RDF given in rdf_db. I could encapsulate the ways of determining the title and links of an RSS item in more pleasing predicates, which would also allow us to change the way they were determined without affecting our template.
rss_title(Item, Title) :-
rdf(Item,rss:title,literal(Title)).
rss_link(Item, Link) :-
rdf(Item,rss:link,literal(Link)).
The query for rss_list_body is more complex. The basic
form is clear: I want the rss:items, so I ask for
rdf(Item, rdf:type, rss:item). But this only gives me the
first rss:item found. I need to say, "Give me, in a list,
all the values that satisfy this query". There are several predicates
that have this or similar meaning. In this case, I'll use
set_of/3, which has the additional virtue of eliminating
duplicates:
set_of(Item, %The variable which gets bound to the desired value.
rdf(Item, rdf:type, rss:item), %The query.
Items) %The variable that gets bound to a list of the results.
This, when popped into rss_list_body, completes the
transforming grammar.