XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Introducing OpenSearch

Introducing OpenSearch

July 24, 2007

Search and web feeds go together pretty naturally, as anyone who has set up some kind of vanity search feed knows. You go to your favorite Web 2.0 search engine and set up a query like http://web20.example.com/search=john+doe&ouptut=atom and search for "john doe," but rather than getting back results as the usual HTML web page, you get it back in Atom format. You can subscribe to this URL in your favorite feed reader, and you have all the useful features of web feeds attached to this search query. Most notably, rather than having to poll the search engine yourself and having to remember which results you have seen, your reader will simply alert you when there are new results. This simple but very useful concept is the core idea behind the OpenSearch specification.

OpenSearch was originally developed at Amazon.com's A9 incubator. It's a specification under the Creative Commons Attribution-ShareAlike License, covering discovery and description documents for search engines, expression of queries, and the convention of RSS 2.0 or Atom Web feed results. It is very RESTful in nature and complementary to the Atom Publishing Protocol (APP). In fact, many have called for OpenSearch to serve as the query aspect of APP, which provides a way to access identified or located results, but no mechanism for ad hoc query. With all this affinity to Atom and REST, OpenSearch is a natural topic for this Agile Web column. OpenSearch 1.0 is still the latest full version; it has been around since 2005. Version 1.1 is in beta, but has some important improvements and is thus the version I'll be discussing.

Finding a Suitable Search Engine

Once you've found a search engine, the first issue is learning more about it and, in particular, how to query it. The OpenSearch description document format is designed to provide this information. Listing 1 is a simple example describing a fictitious search engine for XML.com.

Listing 1: OpenSearch description document

  <?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <ShortName>XML.com search</ShortName>
  <dc:relation href="http://www.xml.com"/>
  <Description>Search XML.com articles and Weblogs</Description>
  <Tags>xml web</Tags>
  <Contact>admin@xml.com</Contact>
  <Url type="application/atom+xml" 
   template="http://search.xml.com/?q={searchTerms}&p={startPage?}&format=atom"/>
  <Query role="example" searchTerms="test+xml"/>
  <Attribution>All content Copyright 2007, O'Reilly and Associates</Attribution>
</OpenSearchDescription>

It's pretty straightforward stuff for the most part. Elements such as ShortName and Description provide basic information for people who are browsing search engine information. Tags and Attribution offer additional details that are useful when narrowing down the choice to use the search engine. Url is an interesting element. It tells the search client how to query the search engine in terms of what URL forms can be used for searching. In this way it connects to another important section of the OpenSearch specification, URL template syntax, which I'll discuss in a later section. Query is another special element that, in this case, tells search clients that they can test the search engine (this test purpose indicated by role="example") by querying with the search terms "test+xml." Query elements are more broadly used in OpenSearch results, as I'll discuss in a later section.

Foreign Markup

Listing 1 also demonstrates how you can extend OpenSearch description syntax using the common mechanism of adding foreign elements in a separate namespace. In this case, there is a Dublin Core metadata element dc:relation to express a simple relationship between search.xml.com and www.xml.com. It's interesting that, besides Url and Query, all the elements in Listing 1 could be expressed in equivalent Atom syntax. Even the foreign dc:relation is similar to atom:link, and the latter provides a bit more expressiveness (though you can even things up a bit by using Dublin Core qualifiers). Listing 2 is an example of the search engine description like in Listing 1, but converted to Atom syntax; it is purely the envelope with no entries, which is perfectly legal in Atom.

Listing 2: Atom document with the equivalent information to the OpenSearch description document in Listing 1

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:os="http://a9.com/-/spec/opensearch/1.1/">
  <id>http://search.xml.com</id>
  <link rel="self" href="http://search.xml.com"/>
  <link type="text/html" href="http://www.xml.com"/>
  <updated>2007-07-07T12:00:00Z</updated>
  <title>XML.com search</title>
  <subtitle>Search XML.com articles and Weblogs</subtitle>
  <author>
    <name>XML.com</name>
    <email>admin@xml.com</email>
  </author>
  <rights>All content Copyright 2007, O'Reilly and Associates</rights>
  <category term="xml"/>
  <category term="web"/>
  <os:Url type="application/atom+xml" 
   template="http://search.xml.com/?q={searchTerms}&p={startPage?}&format=atom"/>
  <os:Query role="example" searchTerms="test+xml"/>
</feed>

There is no need for Dublin Core, in this case, given atom:link. But rather than abuse that element, Url is pulled in from the OpenSearch namespace to express the search URL template. The purpose of this example is not to disparage OpenSearch's choice in rolling its own format. I do believe that it's useful to reuse formats where possible, but I also think that it's important not to push reuse until you're stretching a format to an alien purpose. One could make an argument that Listing 2 stretches the purpose of Atom syntax too far.

Pages: 1, 2

Next Pagearrow