XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Moving Home: Portable Site Information

March 22, 2000




One common use of XML is to provide data for template-based web pages created with XSLT. However, XML can be used to model the actual structure of a web site too. Portable Site Information is a project to develop an XML abstraction for template-based web sites, to allow their migration between site development frameworks such as NetObjects or Cocoon.

Setting the Scene

Your first professional site. Objective: Hack two decades of untamed document growth into inspiring corporate propaganda. Lines of attack:

  1. Hand coding: Get text editor. Change document by hand. Verify changes didn't break anything. Make corrections. Repeat. Site spontaneously emerges fully upright from primordial document slime.

  2. Site manager: Based on an old military maxim: he who wins the high ground wins the battle. Hit documents from above. Herd them into categories. Root out individuality. Site is force-marched out of chaos by leaving no other option open but surrender.

Verdict: Choose the site manager. From the commanding heights you can do no wrong. Triumph is mere formality.

Achilles heel: An itch to tinker. There are better, more ambitious site management frameworks out there. Why not cap your success by migrating your site over? Never fear. Everything else has gone so smoothly.

Post-mortem: Death by reason of incompatibility. The site manager formats explode on contact. Carefully crafted site structure collapses in the blast. Nothing to do but collect the pieces for reassembly. Hopefully, someone at least learned their lesson.

I did. Years later, my friend Patrick Hayes was building yetanother.com with a NetObjects frontend and a Midgard backend. He wanted NetObjects templates to translate directly into Midgard themes and back. I proposed an XML-based format to insure the templates passed cleanly between the two applications. From this came Portable Site Information (PSI).

Why Portable Site Information?

Ambitious sites require a god's eye view. They suffer from epidemic link rot, content decay, navigation tangle, and sitcom-caliber coherence. The grassroots school of web design can't help. Site builders are an urgent necessity, not a luxury.

But, if they're so vital, why don't site management frameworks share even a lowest common denominator compatibility? Cliche: the whole is greater than the sum of its parts. Site managers do the reverse. They deal in parts and sacrifice the whole to application whim. Interoperability strategies are trotted out to mask this gap, ranging from simple metadata exchange to site managers chatting over HTTP. Yet nothing, in the end, can replace the freedom to take your data and move on to greener pastures if you so choose.

PSI extends this freedom to site structure. PSI maps a site not as a collection of parts but as a complete whole. Site structure is preserved in an open format, safe from application-specific chains. Sites can then be moved freely between different site builders without fear of structure loss. To insure this, PSI's design goals are:

  1. A flexible hierarchy of containers.
  2. Clearly separated shared and unique data.
  3. Defining container position exactly in time and space.
  4. Filters for application-specific processing.
  5. Metadata hooks.

Future PSI may include:

  1. Mapping security through access control lists.
  2. Version tracking.
  3. Scheduling for site work coordination.

Basic Site Portability

All sites have at least three common aspects:

  1. Data.
  2. Hierarchy, either in how a site is stored, organized, or both.
  3. Unique instances.

To make a site portable, PSI must account for each of these. Representing data is easy. Mark it as data:

<data>Heart of Darkness</data>

Site hierarchy exists for only one reason: data must appear only when needed. This makes unique instances inevitable. If only part of your data needs to appear, there are at least two unique instances: the data that appears and the data that doesn't. PSI must allow data to be sorted into the right unique instances and stacked in the proper order.

Within sites, most unique instances appear as documents, usually web pages. Within these, data can be subdivided again and presented as paragraphs or tables. PSI maps these with sets.

<set id='conradTitle'>
  <data><h2>Heart of Darkness</h2></data>
</set> 

This is enough for most sites. They're nothing more than a series of static instances. Other sites need more. Modern site design emphasizes taking the shortest route to site completion possible. This requires minimizing redundant data, usually by sharing common data between otherwise unique instances. Sets model this with a global or local scope. If global, data is shared. If local, data belongs to a unique instance (e.g., a particular page on a site). In this example, the title "Disturbed Works" is global, whereas the title "Heart of Darkness" applies to a particular page.

<global> 
  <set id='globalTitle'>
    <data><h1>Disturbed Works</h1></data>
  </set>
</global>
<local id='conrad'>
  <set id='conradTitle'>
    <data><h2>Heart of Darkness</h2></data>
  </set>
</local>

The other half of data sharing is inserts. Inserts mark where sets can be reused in a PSI hierarchy outside of their original location. Like sets, inserts have global or local scope. Global inserts are shared by multiple sets; local inserts are monopolized by a single set.

Global sets and inserts can be combined to represent a sort of shared template. Sets map the shared data and inserts mark where unique data goes. This allows PSI to map sites that use sitewide content generation mechanisms like themes, shared borders, and server-side includes (typically these sites use technologies such as PHP, ASPs or Cold Fusion).

<global>
  <set id='globalTitle'>
    <data><h1>Disturbed Works</h1></data>
  </set>
  <insert global='title'/>
</global>
<local id='conrad'>
  <set id='conradTitle' insert='title'>
    <data><h2>Heart of Darkness</h2></data>
  </set>
</local>

The range of a global "template" is constrained by the group element. Groups can contain one global container and an unconstrained number of local containers. A group is often used to model containers like directories. To capture site hierarchy more accurately, groups can also be nested inside other groups.

<group id='root'>
  <global>
    <set='globalTitle'>
      <data><h1>Disturbed Works</h1></data>
    </set>
    <insert global='title'/>
  </global>
  <local id='conrad'>
    <set id='conradTitle' insert='title'>
      <data><h2>Heart of Darkness</h2></data>
    </set>
  </local>
  <local id='theEnd'>
    <set id='next' insert='title'/>
      <data><h2>The End</h2></data>
    </set>
  </local>
</group>

Since PSI maps other site building blocks, it must have a method to determine when it is transformed into other common site structures. PSI provides internal filters to list specific rules and conditions that must be met to do this. The class attribute is used to group PSI data and then align these groups with specific rules. An application-specific adapter processes the PSI data based on the rules and conditions listed in the filter and routes the site structure to the desired format.

<group class='container' id='root'>
  <global classid='shared'>
    <filter role='in' resolve='pass'>
      <if classid='container'/>
      <then role='map' value='folder'/>
    </filter>
    <filter role='in' resolve='pass'>
      <if classid='shared'/>
      <then role='map' value='shared border'/>
    </filter>
    <filter role='in' resolve='pass'>
      <if classid='page'/>
      <then role='map' value='html'/>
    </filter>
    <filter role='in' resolve='end'>
      <if classid='page'/>
      <then role='map' value='html'/>
    </filter>
    <filter role='in' resolve='end'>
      <if classid='section'/>
      <then role='map' value='div'/>
    </filter>
    <set class='section' id='globalTitle'>
      <data><h1>Disturbed Works</h1></data>
    </set>
    <insert global='title'/>
  </global>
  <local class='page' id='conrad'>
    <set class='section' id="conradTitle" insert='title'>
      <data><h2>Heart of Darkness</h2></data>
    </set>
  </local>
  <local class='section' id='theEnd'>
    <set class='div' id='next' insert='title'/>
      <data><h2>The End</h2></data>
    </set>
  </local>
</group>

In Conclusion

Work on PSI is ongoing. Currently, PSI uses a standard DTD to define its syntax but we plan on migrating it to an RDF schema. This will allow us to exploit PSI with more tools, as well as use it with other RDF formats (like Dublin Core and RSS) to create even more powerful site models.

We're cleaning up our current code into an LGPL library called psilib and then releasing it through psilib.sourceforge.net. It's turning into a useful tool for us and may benefit others, which only makes our jobs as web developers easier, especially if we get future site projects already laid out in PSI.

XML is proven in modeling complex hierarchies for open exchange. Sites are no exception. Developing PSI has helped us glimpse the underlying patterns of the Web. More connects than divides site structures. We hope to see a standard reflecting this evolve so that any pain from future site evolution comes as a side effect of creation, not transportation. PSI may contribute to this. It may just dimly light the way. The end of portable site information is more important than the means.