DSSSL for XML: Why not?
In his recent article on XML.com, Fabio Arciniegas A. explained the "grove." This week, I'll talk about a grove processing and styling language, DSSSL (pronounced "deesel"). DSSSL stands for Document Style Semantics and Specifications Language. Like groves, DSSSL is another technology with its roots in SGML, and sometimes suffers from bad press in the XML world. In fact, as you'll notice after reading this article, a DSSSL style sheet can be as simple to read and understand as a CSS style sheet. Both are rule-based languages, as indeed is XSLT.
DSSSL is mainly intended for processing SGML documents but, since XML 1.0 documents are also SGML documents, they can be processed by a DSSSL engine. The OpenJade DSSSL implementation can be used to style DTD-less well-formed XML documents, as you would with CSS or XSLT. For those of you who use XSLT, you'll find DSSSL has some familiar processing concepts.
Ready to explore the strange lands of list-based languages? Afraid of parentheses? Don't worry, the DSSSL monster is in fact a gentle creature. I'll start by introducing the OpenJade project.
The OpenJade project
OpenJade is an open-source project that originated with the DSSSL mailing list members. The first code base was provided by James Clark, who was also the editor of the W3C's XSLT 1.0 Recommendation. Since then, a small group of developers have improved the code and some, like Matthias Clausen, have invested a lot of their time adding new features and resolving bugs.
Basically, OpenJade is a DSSSL interpreter implemented in C++. The code is freely available from the CVS server hosted by SourceForge. OpenJade has been successfully compiled on multiple platforms such as Linux, Solaris, and Windows. More information about the project and useful pointers to DSSSL resources are available at the OpenJade home page.
The OpenJade engine's architecture is organized as a hierarchy of modules, as shown in the figure below. The XML document is first parsed by the SGML/XML parser, then transformed into a grove. The DSSSL engine then transforms the document using a DSSSL style sheet and one of the styling drivers.
A key concept that you need to grasp is that of formatting objects. These are the basic building blocks of page layout, each having a representation in the target formatting language (such as HTML, RTF).
Example XML document
The XML document we'll use as our example is a simple book catalog document, as shown below.
You can download this document here.
The SGML/XML parser
The parser is an SGML parser, able to parse XML documents with or without validation. However, the current version of OpenJade still needs an SGML declaration file to be included in the command line even if the DTD is not required, as illustrated in the command line used to process our example:
openjade -t html -wno-valid xml.dcl -d catalog.dsl catalog.xml
The command line states that the styling driver to be
used is the HTML driver (
-t html), and
that no validation is to be performed on the XML
-wno-valid). Because the parser is still an SGML
it needs the SGML declaration for the XML format (
The DSSSL style sheet is specified with the -d catalog.dsl
option. (You can download catalog.dsl here.)
The final argument to the command line is the path of the XML document
If you want to transform the XML into output formats other than HTML,
-t option (e.g.,
-t rtf for Rich Text Format
output). PDF styling is currently a work in progress.
The Grove Manager
The grove manager is the "event sink" for parsing events fired by OpenSP or SP (this happens in a similar way to event-based XML parser mechanisms such as SAX). The grove manager interfaces with the DSSSL engine, which uses the grove to fire "construction rules." A construction rule matches on a grove node, and the rule's body provides the instructions used to build a flow object tree.
The DSSSL Style Engine
The DSSSL engine module takes as its input the XML document's grove and the DSSSL style sheet document, itself an SGML document, as shown below. (Because SGML allows you to omit tags, no elements are required to enclose the DSSSL construction rules.)
<!doctype style-sheet PUBLIC "-//Netfolder//DTD DSSSL library//EN"> (root (make scroll (process-children) ) ) (element book (make paragraph start-indent: 30pt font-family-name: "Times New Roman" font-size: 12pt font-weight: 'medium space-after: 10pt (process-matching-children "title") (process-matching-children "category") (process-matching-children "code") (process-matching-children "price") ) ) (element title (make paragraph font-family-name: "Arial" font-size: 14pt font-weight: 'bold ) ) (element category (make paragraph (literal "category:") (process-children) ) ) (element code (make paragraph (literal "code:") (process-children) ) ) (element price (make paragraph (literal "price:") (process-children) ) )
The Element Construction Rule
The element construction rule is quite simple. The rule's pattern-match component contains the element to be matched. The DSSSL engine will match a grove element's node against an element construction rule. When a match occurs, the rule body is executed, building a flow object tree.
We will transform our book catalog document into both HTML and RTF.
For both transformations, we'll use much the same flow objects, with the
exception of the main document container. For HTML, the main
document container will be
scroll (for scrolled visual media), and
simple-page-sequence (for RTF).
The main document container will be specified by the first construction rule. This rule matches the document grove node:
process-children construct instructs the DSSSL
engine to process all the element's children nodes. This construct is
equivalent to the XSLT
<xsl:apply-templates/> element. In both
languages, the styling engine tries to match the document's nodes
either to templates (in XSLT) or to construction rules (DSSSL).
<book> element is then matched with its corresponding
(element book (make paragraph start-indent: 30pt font-family-name: "Times New Roman" font-size: 12pt font-weight: 'medium space-after: 10pt (process-matching-children "title") (process-matching-children "category") (process-matching-children "code") (process-matching-children "price") ) )
Each time a
<book> element is matched in the grove, the DSSSL engine "makes"
a paragraph object. The paragraph's visual appearance is specified by a set
of properties. In the above style sheet fragment, we indicate that the
is indented (
start-indent: 30pt), that its font is set to
12 point Times New Roman, medium-weight
font-family-name: "Times New
Roman", font-size: 12pt, font-weight: 'medium), and finally that
the space after the paragraph is to be 10 points
All subsequent children-formatting objects will inherit these
<price> elements' formatting
objects will be displayed according to these properties. The construction rules
matching these elements are fired by the
construct. As in XSLT, specific elements to process can be specified.
So, for instance, the construct
"title" will cause the DSSSL engine to match
<title> elements that are children of the
Finally, the elements contained in the
<book> element are
processed by a construction rule matching the contained elements. Here's
an example for
(element category (make paragraph (literal "category:") (process-children) ) )
literal construct is used here to insert a string
<category> data content.
The element's data content is included
in the resulting output tree by the
The OpenJade engine allows you to create output in different formats
with nearly the same style sheet. The only construction rule to be
modified is the root rule. For an HTML output, the root rule "makes" a
scroll object. For fixed page formats like RTF, TeX, or MIF
the root rule "makes" a
simple-page-sequence object. This
is the only modification required as the other formatting objects (like the
paragraph) are mapped directly to the different rendering languages'
The result of the transformation into HTML, shown below, is in fact two documents created by the OpenJade engine: an HTML document, and a CSS document. The style applied to each HTML formatting object is simply specified by the CSS document. The end result can be displayed with any browser able to interpret CSS styling.
As you probably noticed, DSSSL has some traits redolent of both CSS and XSLT.
As in CSS, each formatting object is specified with a set of properties.
XSLT, processing order is controlled by a processing construct
construct allows you to change the display order and thus separate the
view from the model.
So, when can you use DSSSL? At the moment DSSSL is the best choice for transforming XML into documents for print purposes. The nearest W3C-blessed alternative, XSL formatting objects, is still in development. Offering easily maintainable output to HTML, RTF, TeX, and MIF, DSSSL is particularly well-suited for documents targeted to multiple output media.