XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

DSSSL for XML: Why not?

May 02, 2000

In his recent article on XML.com, Fabio Arciniegas A. explained the "grove." This week, I'll talk about a grove processing and styling language, DSSSL (pronounced "deesel"). DSSSL stands for Document Style Semantics and Specifications Language. Like groves, DSSSL is another technology with its roots in SGML, and sometimes suffers from bad press in the XML world. In fact, as you'll notice after reading this article, a DSSSL style sheet can be as simple to read and understand as a CSS style sheet. Both are rule-based languages, as indeed is XSLT.

DSSSL is mainly intended for processing SGML documents but, since XML 1.0 documents are also SGML documents, they can be processed by a DSSSL engine. The OpenJade DSSSL implementation can be used to style DTD-less well-formed XML documents, as you would with CSS or XSLT. For those of you who use XSLT, you'll find DSSSL has some familiar processing concepts.

Ready to explore the strange lands of list-based languages? Afraid of parentheses? Don't worry, the DSSSL monster is in fact a gentle creature. I'll start by introducing the OpenJade project.

The OpenJade project

OpenJade is an open-source project that originated with the DSSSL mailing list members. The first code base was provided by James Clark, who was also the editor of the W3C's XSLT 1.0 Recommendation. Since then, a small group of developers have improved the code and some, like Matthias Clausen, have invested a lot of their time adding new features and resolving bugs.

Basically, OpenJade is a DSSSL interpreter implemented in C++. The code is freely available from the CVS server hosted by SourceForge. OpenJade has been successfully compiled on multiple platforms such as Linux, Solaris, and Windows. More information about the project and useful pointers to DSSSL resources are available at the OpenJade home page.

The Architecture

The OpenJade engine's architecture is organized as a hierarchy of modules, as shown in the figure below. The XML document is first parsed by the SGML/XML parser, then transformed into a grove. The DSSSL engine then transforms the document using a DSSSL style sheet and one of the styling drivers.

A key concept that you need to grasp is that of formatting objects. These are the basic building blocks of page layout, each having a representation in the target formatting language (such as HTML, RTF).

Example XML document

The XML document we'll use as our example is a simple book catalog document, as shown below.

<?xml version="1.0"?>
<book-catalog> 
     <book>  
         <code>16-048</code>
         <category>Scripting</category>
         <title>Instant JavaScript</title>
         <price>49.34$</price>
     </book>  
     <book>
         <code>16-105</code>
         <category>XML</category>
         <title>Professional XML</title>
         <price>39.95$</price>
     </book>
</book-catalog>

You can download this document here.

The SGML/XML parser

The parser is an SGML parser, able to parse XML documents with or without validation. However, the current version of OpenJade still needs an SGML declaration file to be included in the command line even if the DTD is not required, as illustrated in the command line used to process our example:

openjade -t html -wno-valid xml.dcl -d catalog.dsl catalog.xml

The command line states that the styling driver to be used is the HTML driver (-t html), and that no validation is to be performed on the XML document (-wno-valid). Because the parser is still an SGML parser, it needs the SGML declaration for the XML format (xml.dcl). The DSSSL style sheet is specified with the -d catalog.dsl option. (You can download catalog.dsl here.) The final argument to the command line is the path of the XML document to process. If you want to transform the XML into output formats other than HTML, alter the -t option (e.g., -t rtf for Rich Text Format output). PDF styling is currently a work in progress.

The Grove Manager

The grove manager is the "event sink" for parsing events fired by OpenSP or SP (this happens in a similar way to event-based XML parser mechanisms such as SAX). The grove manager interfaces with the DSSSL engine, which uses the grove to fire "construction rules." A construction rule matches on a grove node, and the rule's body provides the instructions used to build a flow object tree.

The DSSSL Style Engine

The DSSSL engine module takes as its input the XML document's grove and the DSSSL style sheet document, itself an SGML document, as shown below. (Because SGML allows you to omit tags, no elements are required to enclose the DSSSL construction rules.)

<!doctype style-sheet PUBLIC "-//Netfolder//DTD DSSSL library//EN">
(root
   (make scroll
       (process-children)
   )
) 

(element book
   (make paragraph
      start-indent: 30pt
      font-family-name: "Times New Roman"
      font-size: 12pt
      font-weight: 'medium
      space-after: 10pt
      (process-matching-children "title")
      (process-matching-children "category")
      (process-matching-children "code")
      (process-matching-children "price") 
   )
) 

(element title
   (make paragraph
      font-family-name: "Arial"
      font-size: 14pt
      font-weight: 'bold
   )
) 

(element category
   (make paragraph
      (literal "category:")
      (process-children)
   )
)  

(element code
   (make paragraph
      (literal "code:")
      (process-children)
   )
) 

(element price
    (make paragraph
      (literal "price:")
      (process-children)
   )
)

The Element Construction Rule

The element construction rule is quite simple. The rule's pattern-match component contains the element to be matched. The DSSSL engine will match a grove element's node against an element construction rule. When a match occurs, the rule body is executed, building a flow object tree.

We will transform our book catalog document into both HTML and RTF. For both transformations, we'll use much the same flow objects, with the exception of the main document container. For HTML, the main document container will be scroll (for scrolled visual media), and simple-page-sequence (for RTF).

The main document container will be specified by the first construction rule. This rule matches the document grove node:

The process-children construct instructs the DSSSL engine to process all the element's children nodes. This construct is equivalent to the XSLT <xsl:apply-templates/> element. In both languages, the styling engine tries to match the document's nodes either to templates (in XSLT) or to construction rules (DSSSL).

Each <book> element is then matched with its corresponding construction rule:

(element book 
    (make paragraph 
        start-indent: 30pt 
        font-family-name: "Times New Roman" 
        font-size: 12pt 
         font-weight: 'medium 
        space-after: 10pt 
       (process-matching-children "title") 
       (process-matching-children "category") 
       (process-matching-children "code") 
       (process-matching-children "price") 
    ) 
)

Each time a <book> element is matched in the grove, the DSSSL engine "makes" a paragraph object. The paragraph's visual appearance is specified by a set of properties. In the above style sheet fragment, we indicate that the paragraph is indented (start-indent: 30pt), that its font is set to 12 point Times New Roman, medium-weight (font-family-name: "Times New Roman", font-size: 12pt, font-weight: 'medium), and finally that the space after the paragraph is to be 10 points (space-after: 10pt).

All subsequent children-formatting objects will inherit these properties. So, the <title>, <category>, <code>, and <price> elements' formatting objects will be displayed according to these properties. The construction rules matching these elements are fired by the process-matching-children construct. As in XSLT, specific elements to process can be specified. So, for instance, the construct process-matching-children "title" will cause the DSSSL engine to match any <title> elements that are children of the <book-catalog> element.

Finally, the elements contained in the <book> element are processed by a construction rule matching the contained elements. Here's an example for <category>:

(element category 
    (make paragraph 
        (literal "category:") 
        (process-children) 
    ) 
) 

The literal construct is used here to insert a string before the <category> data content. The element's data content is included in the resulting output tree by the process-children construct.

The OpenJade engine allows you to create output in different formats with nearly the same style sheet. The only construction rule to be modified is the root rule. For an HTML output, the root rule "makes" a scroll object. For fixed page formats like RTF, TeX, or MIF (FrameMaker), the root rule "makes" a simple-page-sequence object. This is the only modification required as the other formatting objects (like the paragraph) are mapped directly to the different rendering languages' constructs.

The result of the transformation into HTML, shown below, is in fact two documents created by the OpenJade engine: an HTML document, and a CSS document. The style applied to each HTML formatting object is simply specified by the CSS document. The end result can be displayed with any browser able to interpret CSS styling.

Conclusion

As you probably noticed, DSSSL has some traits redolent of both CSS and XSLT. As in CSS, each formatting object is specified with a set of properties. As in XSLT, processing order is controlled by a processing construct ( process-children, process-matching-children). This construct allows you to change the display order and thus separate the view from the model.

So, when can you use DSSSL? At the moment DSSSL is the best choice for transforming XML into documents for print purposes. The nearest W3C-blessed alternative, XSL formatting objects, is still in development. Offering easily maintainable output to HTML, RTF, TeX, and MIF, DSSSL is particularly well-suited for documents targeted to multiple output media.