DSSSL for XML: Why not?
May 2, 2000
In his recent article on XML.com, Fabio Arciniegas A. explained the "grove." This week, I'll talk about a grove processing and styling language, DSSSL (pronounced "deesel"). DSSSL stands for Document Style Semantics and Specifications Language. Like groves, DSSSL is another technology with its roots in SGML, and sometimes suffers from bad press in the XML world. In fact, as you'll notice after reading this article, a DSSSL style sheet can be as simple to read and understand as a CSS style sheet. Both are rule-based languages, as indeed is XSLT.
DSSSL is mainly intended for processing SGML documents but, since XML 1.0 documents are also SGML documents, they can be processed by a DSSSL engine. The OpenJade DSSSL implementation can be used to style DTD-less well-formed XML documents, as you would with CSS or XSLT. For those of you who use XSLT, you'll find DSSSL has some familiar processing concepts.
Ready to explore the strange lands of list-based languages? Afraid of parentheses? Don't worry, the DSSSL monster is in fact a gentle creature. I'll start by introducing the OpenJade project.
The OpenJade project
OpenJade is an open-source project that originated with the DSSSL mailing list members. The first code base was provided by James Clark, who was also the editor of the W3C's XSLT 1.0 Recommendation. Since then, a small group of developers have improved the code and some, like Matthias Clausen, have invested a lot of their time adding new features and resolving bugs.
Basically, OpenJade is a DSSSL interpreter implemented in C++. The code is freely available from the CVS server hosted by SourceForge. OpenJade has been successfully compiled on multiple platforms such as Linux, Solaris, and Windows. More information about the project and useful pointers to DSSSL resources are available at the OpenJade home page.
The OpenJade engine's architecture is organized as a hierarchy of modules, as shown in the figure below. The XML document is first parsed by the SGML/XML parser, then transformed into a grove. The DSSSL engine then transforms the document using a DSSSL style sheet and one of the styling drivers.
A key concept that you need to grasp is that of formatting objects. These are the basic building blocks of page layout, each having a representation in the target formatting language (such as HTML, RTF).
Example XML document
The XML document we'll use as our example is a simple book catalog document, as shown below.
You can download this document here.
The SGML/XML parser
The parser is an SGML parser, able to parse XML documents with or without validation. However, the current version of OpenJade still needs an SGML declaration file to be included in the command line even if the DTD is not required, as illustrated in the command line used to process our example:
openjade -t html -wno-valid xml.dcl -d catalog.dsl catalog.xml
The command line states that the styling driver to be used is the HTML driver (
html), and that no validation is to be performed on the XML document
-wno-valid). Because the parser is still an SGML parser, it needs the SGML
declaration for the XML format (
xml.dcl). The DSSSL style sheet is specified
with the -d catalog.dsl option. (You can download catalog.dsl here.) The final argument to the
command line is the path of the XML document to process. If you want to transform
into output formats other than HTML, alter the
-t option (e.g.,
rtf for Rich Text Format output). PDF styling is currently a work in progress.
The Grove Manager
The grove manager is the "event sink" for parsing events fired by OpenSP or SP (this happens in a similar way to event-based XML parser mechanisms such as SAX). The grove manager interfaces with the DSSSL engine, which uses the grove to fire "construction rules." A construction rule matches on a grove node, and the rule's body provides the instructions used to build a flow object tree.
The DSSSL Style Engine
The DSSSL engine module takes as its input the XML document's grove and the DSSSL style sheet document, itself an SGML document, as shown below. (Because SGML allows you to omit tags, no elements are required to enclose the DSSSL construction rules.)
<!doctype style-sheet PUBLIC "-//Netfolder//DTD DSSSL library//EN"> (root (make scroll (process-children) ) ) (element book (make paragraph start-indent: 30pt font-family-name: "Times New Roman" font-size: 12pt font-weight: 'medium space-after: 10pt (process-matching-children "title") (process-matching-children "category") (process-matching-children "code") (process-matching-children "price") ) ) (element title (make paragraph font-family-name: "Arial" font-size: 14pt font-weight: 'bold ) ) (element category (make paragraph (literal "category:") (process-children) ) ) (element code (make paragraph (literal "code:") (process-children) ) ) (element price (make paragraph (literal "price:") (process-children) ) )
The Element Construction Rule
The element construction rule is quite simple. The rule's pattern-match component contains the element to be matched. The DSSSL engine will match a grove element's node against an element construction rule. When a match occurs, the rule body is executed, building a flow object tree.
We will transform our book catalog document into both HTML and RTF. For both
transformations, we'll use much the same flow objects, with the exception of the main
document container. For HTML, the main document container will be
scrolled visual media), and
simple-page-sequence (for RTF).
The main document container will be specified by the first construction rule. This rule matches the document grove node:
process-children construct instructs the DSSSL engine to process all the
element's children nodes. This construct is equivalent to the XSLT
<xsl:apply-templates/> element. In both languages, the styling engine
tries to match the document's nodes either to templates (in XSLT) or to construction
<book> element is then matched with its corresponding construction
(element book (make paragraph start-indent: 30pt font-family-name: "Times New Roman" font-size: 12pt font-weight: 'medium space-after: 10pt (process-matching-children "title") (process-matching-children "category") (process-matching-children "code") (process-matching-children "price") ) )
Each time a
<book> element is matched in the grove, the
DSSSL engine "makes" a paragraph object. The paragraph's visual appearance is specified
set of properties. In the above style sheet fragment, we indicate that the paragraph
start-indent: 30pt), that its font is set to 12 point Times New
Roman, medium-weight (
font-family-name: "Times New Roman", font-size: 12pt,
font-weight: 'medium), and finally that the space after the paragraph is to be 10
All subsequent children-formatting objects will inherit these properties. So, the
<price> elements' formatting objects will be displayed according to
these properties. The construction rules matching these elements are fired by the
process-matching-children construct. As in XSLT, specific elements to process
can be specified. So, for instance, the construct
"title" will cause the DSSSL engine to match any
elements that are children of the
Finally, the elements contained in the
<book> element are processed by a
construction rule matching the contained elements. Here's an example for
(element category (make paragraph (literal "category:") (process-children) ) )
literal construct is used here to insert a string before the
<category> data content. The element's data content is included in the
resulting output tree by the
The OpenJade engine allows you to create output in different formats with nearly the
style sheet. The only construction rule to be modified is the root rule. For an HTML
the root rule "makes" a
scroll object. For fixed page formats like RTF, TeX, or
MIF (FrameMaker), the root rule "makes" a
simple-page-sequence object. This is
the only modification required as the other formatting objects (like the paragraph)
mapped directly to the different rendering languages' constructs.
The result of the transformation into HTML, shown below, is in fact two documents created by the OpenJade engine: an HTML document, and a CSS document. The style applied to each HTML formatting object is simply specified by the CSS document. The end result can be displayed with any browser able to interpret CSS styling.
As you probably noticed, DSSSL has some traits redolent of both CSS and XSLT. As in
each formatting object is specified with a set of properties. As in XSLT, processing
is controlled by a processing construct (
process-matching-children). This construct allows you to change the display
order and thus separate the view from the model.
So, when can you use DSSSL? At the moment DSSSL is the best choice for transforming XML into documents for print purposes. The nearest W3C-blessed alternative, XSL formatting objects, is still in development. Offering easily maintainable output to HTML, RTF, TeX, and MIF, DSSSL is particularly well-suited for documents targeted to multiple output media.