Menu

Microcontent Management with Syncato

March 17, 2004

Kimbro Staken

In the past few years there's been a surge in popularity for what has become known as blogging. The weblog (or "blog") is an online journal of links and information. It seems everyone has one now. I've written a blog for several years now using Movable Type to manage it. Unfortunately, I've always been bothered by the very rigid perspective provided by most weblog management systems. Most of these systems are based on a relational database system and have a limited number of post types that you can add. I wanted something that would be much more flexible and allow you to define an arbitrary level of structure to the content you add to your site. This lead me to the development of the system that is now known as Syncato.

Weblogs with Syncato

To many people, weblogs are simply a way to post their writing to the Web. For me weblogs are a personal data repository where you can store all kinds of information that you want to make publicly available. This includes, of course, writing, but I also wanted to be able to store other types of information. It was this desire that drove the design of Syncato and brought about the focus on achieving maximum flexibility for the data it stores.

Syncato is an XML system from top to bottom. It's written in Python, but my goal has been to minimize the amount of Python code involved. All data is defined in XML and as much as possible, XML tools are used to manipulate the data. It uses Sleepycat Berkeley DB XML as its database and converts the XML content into presentation formats via the libxslt library. When data moves from the database to presentation, it is always treated as XML, i.e. there is no object model per se, there is only an XML model.

While Syncato is based on XML, it does not use any of the schema constraint mechanisms that many people employ. I felt that any type of schema constraint would destroy the value of the system. It is intended to be an open container for your data. You should be able to improvise as you're creating your content and add whatever you want into the database without having to stop and make modifications to the schema for the system. This doesn't mean the system is without rules, it just means that the system will accept any XML that you want to throw at it.

There are some rules, but they exist because they're required to be able to display the information correctly. For instance, if you want to create a new weblog post you have to add an XML document that has a root element named item and a child element named title. You're free to add other XML if you want, but it won't show up as a weblog post unless you follow these basic rules. Here's a simple weblog post in XML format.


<item>

  <title>A simple post</title>

  <description><p>

A simple post body.

</p>

</description>

  <category>Music</category>

<pubDate seconds="1070159676.25">2003-11-29T19:34:36-07:00</pubDate></item>

There's nothing too special about that. Overall the functionality of Syncato, as a basic weblog system -- except for the pervasive use of XML -- isn't much different then systems like Movable Type. What's more interesting is looking at what Syncato does that other systems can't.

What is Microcontent?

Syncato started out as a weblog system, but my real goal when building it was to achieve something much more flexible. What I really wanted was to be able to define custom content types that could be embedded within posts to provide more advanced capabilities. For example, on my site I like to post lists of songs that form CD mixes. To do this I create a playlist in iTunes and then run a script to convert the playlist into a custom XML format. This XML is then included in a post to my weblog. Here's an excerpt from my favorite mix CD.


<songlist>

  <title>blues and soul</title>

  <description>A slow blues groove ending with a little fire.</description>

  <song>

    <title>On The Road Again</title>

    <artist>Canned Heat</artist>

    <album>Canned Heat</album>

    <length>3:23</length>

  </song>

  <song>

    <title>Crossroads</title>

    <artist>Tom Waits</artist>

    <album>The Black Rider</album>

    <length>2:43</length>

  </song>

  

  ...

  

  <song>

    <title>Pride And Joy</title>

    <artist>Various - Soundtrack for a Century</artist>

    <album>Folk, Gospel & Blues: Will The Circle Be Unbroken (Disc 2)</album>

    <length>3:41</length>

  </song>

  <song>

    <title>thickfreakness</title>

    <artist>The Black Keys</artist>

    <album>thickfreakness</album>

    <length>3:48</length>

  </song>

</songlist>

I call this microcontent; it's the critical element that forms the core of Syncato's functionality. You're free to structure the content of your posts however you want, but if you add custom XML it won't show up on the site by default. To solve this, you simply create an XSL transform to convert your raw XML into whatever presentation format you want. For the CD mix I use a couple XSLT templates to convert it into HTML for display on my site.


    <xsl:template match="songlist">

        <table class="table-border">

            <tr>

                <th>Song Name</th>

                <th>Artist</th>

                <th>Album</th>

                <th>Length</th>

            </tr>

            <xsl:apply-templates select="song"/>

        </table>

    </xsl:template>

    

    <xsl:template match="song">

        <xsl:variable name="style">

            <xsl:choose>

                <xsl:when test="position() mod 2 = 0">row-plain</xsl:when>

                <xsl:otherwise>row-color</xsl:otherwise>

            </xsl:choose>

        </xsl:variable>



        <tr class="{$style}">

            <td><xsl:value-of select="title"/></td>

            <td><xsl:value-of select="artist"/></td>

            <td><xsl:value-of select="album"/></td>

            <td><xsl:value-of select="length"/></td>

        </tr>

    </xsl:template>

This turns the raw XML into a nice HTML table. Using this technique you can define any kind of custom content types that you want. Another recent example from my site is based on my photography hobby. I wanted to create a photoblog, so I came up with a custom photo XML schema and a simple stylesheet to turn it into HTML. Now anytime I want to add a new photo to the site I just add a small XML description of the photo that looks something like this:


  <photo>

    <file>picture.jpg</file>

    <caption>A photo journalistic style caption.</caption>

    <lens>EF 85mm f1.8</lens>

    <camera>Canon 10D</camera>   

  </photo>

This microcontent is included as part of the weblog post and can be surrounded with whatever other text or data that you want. With the associated XSLT transformation it shows up on the site in a frame with the caption and equipment description below it.

I chose to go with a simple format to start, but it's completely open to whatever you want to add. It would be nice to have a tool that converted the EXIF metadata from a photo into XML. As long as you're careful to build your tools with minimum assumptions about the structure of the data, you can evolve your schemas over time without too much difficulty.

Searching

Converting your custom XML types for presentation is important, but Syncato also has another trick up its sleeve when it comes to searching content. Syncato enables XPath search of the content in the database via the URL field of your browser. For example I can search my weblog to find all posts that contain song lists with the song Pride And Joy.

http://www.xmldatabases.org/WK/blog/item//songlist[song/title='Pride%20And%20Joy']

This XPath returns just the portion of the document that you're requesting. The result you get back looks something like this (abbreviated to save space):

<?xml version="1.0"?><results>

<songlist>

...

  <song>

    <title>Pride And Joy</title>

    <artist>Various - Soundtrack for a Century</artist>

    <album>Folk, Gospel & Blues: Will The Circle Be

    	      Unbroken (Disc 2)</album>

    <length>3:41</length>

  </song>

...

</songlist>



</results>

This is an XML result, so it may not display in some browsers. The result comes back wrapped in a results tag so that you can get more than one query result in the same request without violating the rules of XML.

We could also run a slightly different query and then transform the result via XSLT to get HTML.

http://www.xmldatabases.org/WK/blog/item[//songlist/song/title='Pride%20And%20Joy']?t=item

In this case we're retrieving the actual weblog post and then format it using the XSLT used to display weblog entries. You'll need to click on the link to see the result. Syncato provides a few basic stylesheets and you can add any others that you want.

Another type of query is a content query. This type of query is applied to the result after the XSL transformation has been executed. The intention is to allow you to extract portions of the final HTML. So we can extend the previous query to just retrieve the HTML table containing the song list.

http://www.xmldatabases.org/WK/blog/item[//songlist/song/title='Pride%20And%20Joy']?t=item&c=//table

Again, you need to click on the link to see the result. This request actually runs two queries; the first to select the original result set and then a second query after the XSLT has been applied to select a block out of the resulting HTML.

Overall the goal here is to provide as much power in reusing your content as possible. You can search the raw XML, transform it in whatever ways you want, and then even extract pieces of the transformed result. It's all about maximum flexibility in data reuse.

Conclusion

Syncato is a powerful system that leverages XML technologies, particularly XPath and XSLT, to the maximum extent possible. It manages your personal data and attempts to provide tools that give you a lot of flexibility and power. This article just scratches the surface of what you can do with it, but should give you some ideas of what is possible. Simple weblogs are the common use of the system, but it's capable of much more.