Menu

On Display: XML Web Pages with Mozilla

March 29, 2000

Simon St. Laurent



Table of Contents

Working with XML in Mozilla
Getting Started with XML and CSS
Connecting Style Sheets to XML Documents
The Document Backbone: CSS2 display
A More Complex Example
Adding Basic Links and Images
Next Steps

Direct display of XML in a web browser is finally becoming a reality. This article is the first of a series in which we will examine XML support in the Mozilla, Opera, and Internet Explorer browsers. -- E.D.

Although Cascading Style Sheets Level 2 provides a solid set of tools for presenting XML documents in web browsers, web developers have been waiting a very long time for an implementation that lets them really use their CSS skills with XML. Internet Explorer 5.0 took some credible first steps toward XML+CSS (see Tim Bray's review of Windows IE5 for details), but the latest work from Mozilla goes beyond first steps to a usable set of tools. The solid XML+CSS core and the underlying DOM support suggests that Mozilla will be a useful platform for building applications, not just web pages. Add to that a dash of XLink support, and it looks like Mozilla may be leading the pack.

Mozilla's emphasis on standards-orientation makes its implementation of XML a real pleasure to work with. Developers used to working with CSS in an HTML context have a bit of extra learning to do, as a CSS property called display is critical to presenting XML documents. This property doesn't generally receive much use in HTML. Fortunately, finding information on display isn't difficult. For the most part, I've relied on the W3C specs as documentation for writing this article, a real change from my usual practice of combing through vendor documentation and creating test cases to see if they're accurate. (In particular, I used the CSS2 Recommendation.)

There are, of course, a few bugs yet to iron out -- this isn't even beta software yet -- though it should be quite soon. We'll start by exploring the XML+CSS support, building some test pages that will show off what's possible, and then connecting them together with some basic links. By the end of this article, we'll have a very capable set of tools for building simple web sites, and a solid foundation for building web applications.

Working with XML in Mozilla

The infrastructure Mozilla provides for handling XML is pretty simple. The XML support in Mozilla is built on James Clark's non-validating expat parser. The output from expat is then fed into a DOM-tree builder and document structures can be styled with CSS just like HTML documents. From a casual user's perspective, well-formed XML documents with style sheets look just like HTML documents, and work like them as well. Users can navigate and print XML documents just like they do HTML documents.

Mozilla's XML support lives up to the XML specification, but does have a few quirks developers should know about. Mozilla ignores external parsed entities and entities declared outside of the actual document. If you want to use entities in Mozilla, you'll need to declare them in the internal subset. Entity references don't appear on screen as part of the content -- they just disappear. This behavior is perfectly legal, though it may seem inconvenient. Mozilla does provide a bin/dtd folder where you can put additional entities. This is used to provide support for things like MathML, but isn't a readily available option for most developers.

Mozilla also supports namespaces -- the HTML namespace in particular can be very useful at times. Namespace support will also be critical for XLink when support arrives for more recent XLink drafts, giving Mozilla a way to find XLink's "global attributes" for proper link processing. Mozilla's XUL tools for building user interfaces with XML also rely on namespaces in a similar fashion. A W3C working draft describing relationships between namespaces and CSS is also implemented in Mozilla.

Getting Started with XML and CSS

To get started with XML and Cascading Style Sheets, we'll build a "sandbox" example, a fairly meaningless document. The document shows off Mozilla's tools for formatting arbitrary XML content with CSS, though it more or less replicates HTML's capabilities in a different vocabulary. "Laboratory documents" are very handy for figuring out what works and what doesn't, without the pressure of a particular format. However, they are a luxury that typically only writers and trainers get to indulge in!

We'll start out with a block element containing an inline element, along with two different kinds of lists and a small table:


<?xml version="1.0" ?>

<test>

<block>This is a block element that contains 

<inline>inline</inline> elements.</block>

<bulletList>

<listItem>This is a list item, contained inside of a 

  bulleted list.</listItem>

<listItem>This is a second list item, also contained 

  inside of a bulleted list.</listItem>

</bulletList>

<numberList>

<listItem>This is a list item, contained inside 

  of a numbered list.</listItem>

<listItem>This is a second list item, also contained

  inside of a numbered list.</listItem>

</numberList>

<hidden>This element shouldn't appear at all.</hidden>

<myTable>

<tableRow>

<tableData>Howdy!</tableData>

<tableData>Adios!</tableData>

</tableRow>

<tableRow>

<tableData>Howdy!</tableData>

<tableData>Adios!</tableData>

</tableRow>

</myTable>

</test>

It's probably reasonably obvious to a human how this information should be presented, but the browser will be lost until we give it more explicit directions: the style sheet.

Connecting Style Sheets to XML Documents

XML documents don't have the link or style elements that are used in HTML to connect style information to particular documents. Instead, the W3C has defined a processing instruction that provides that information, based on the model of the HTML link element. To connect a CSS style sheet to your XML document so that the browser can find it, use a processing instruction like


<?xml-stylesheet type="text/css" href="URI"?>

where URI is the address of the style sheet. We'll use a style sheet called display1.css for our first test document. The processing instruction can go right after the XML declaration.


<?xml version="1.0" ?>

<?xml-stylesheet type="text/css" href="display1.css"?>

<test>



...

You can connect to multiple style sheets when needed -- the style sheet processing instruction will work with the rules built into CSS describing the cascade. XML documents cannot use in-line styling. Unlike HTML elements, the style attribute has no particular meaning in XML. If you need to style the content of a particular element differently, your best bet is probably to assign it an ID attribute and then reference that ID value in the style sheet.

Next step: building a backbone for the document's presentation using CSS2's display property.

The Document Backbone: CSS2 display

CSS1 was designed to work alongside the HTML vocabulary, which came with its own formatting semantics. Paragraphs were separated from other text by line breaks, while bold text would just flow with the surrounding text. An intricate set of rules could be used to construct all kinds of complex table structures. CSS2, designed at least partly for work outside of HTML, gives developers a chance to identify those structures for their own vocabularies. This means we can take the sample document above and create a foundation set of rules for it, on which we can then layer the rest of our formatting through CSS.

The display property defines how element content fits within a flow of text, based on a set of rules for creating (or not creating) boxes on the page. Effectively, it lets you map your elements to a set of types based on structures used by HTML, though some of the types are new. The structures you describe with the display property provide the base on which you can layer all your other formatting.

The two most commonly used values for the display property are block and inline. Setting the display property's value to block specifies that the element should be treated as its own block of text, not flowed together with the preceding and following content. The value inline means the opposite -- no block is created, and the textual content of the element is flowed with the text before and after. A third basic option, none, means that none of the content whatsoever is shown, and the element and its contents are invisible, leaving no trace on the document presentation.

Most of the other properties are variations on block and inline. The list-item value, for instance, creates a block to present an item within a list; and an inline block within that contains the content of the list item. The marker, run-in, and compact values may behave as block or inline, depending on context. The rest of the options describe tables (which may occupy blocks, or be inline within a containing block) and components within those tables. This relatively small set of descriptions includes enough flexibility to describe most document flows easily.

To see what this looks like in practice, we'll build a style sheet that presents the XML document above in the format inferred by its elements' names. First, we'll make the element type named block behave as a block element, and make the inline element type behave as an inline element. So that we can see what we're doing in the results, we'll also make the inline element bold.


block {display:block}

inline {display:inline; font-weight:bold;}

The lists are next. Lists, as in HTML, are typically built out of a list container and then the items within the list. Most of the formatting for the list is done in the containing element. For the bulletList, we'll specify a display property of block, along with a left indent of forty pixels and a style of disc, which will produce round bullets to the left of the item text. For listItems within the bulletList element, we'll just set the display property to list-item.


bulletList {display:block; margin-left:40px; 

  list-style-type:disc;}

bulletList listItem {display:list-item;  }

The numberList element and listItem element types within that element will get similar formatting, except that we'll change the list-style-type so that the list gets assigned numbers rather than bullets. We'll need to add additional information to generate the counters, using the counter-reset, counter, and counter-increment properties as well. (Leaving out the counter information produces zeros in your documents.)


numberList {display:block; margin-left:40px;

  list-style-type:decimal; counter-reset: item;}

numberList listItem {display:list-item; }

numberList  { 

  content: counter(item); 

  counter-increment: item;

}



The hidden element will live up to its name with a display property value of none. It will disappear entirely from the presentation.


hidden {display:none;}

Finally, we'll build an extremely simple table, using only the display property values of table, table-row, and table-cell.


myTable {display:table;}

tableRow {display:table-row;}

tableData {display:table-cell;}

The results aren't exactly gorgeous, but the CSS display property does what it's supposed to: build a backbone for presentation structure, as shown below.


Figure 1 - Mozilla understands the display property values used.

Unfortunately, Internet Explorer 5.01 doesn't understand nearly as much about the display property, picking up only on the block, inline, and none settings. Note, for instance, that the indents on the list items are inherited from their containing elements' style, but the list items themselves are treated as inline.


Figure 2 - Internet Explorer 5.01 is a little behind on support for the display property, though some things work.

Using this basic framework, you can build some pretty amazing document structures, though it's still basically the same set of things you could do with plain old HTML. That shouldn't be discounted, however. Even if all you want to do is create a localized vocabulary for marking up documents, this is handy, and the ability to map CSS presentation structures to arbitrary markup means that it's easy to build quick viewers for information.

A More Complex Example

Example Files

display1.xml
display1.css
books1.xml
books1.css
books2.xml
books2.css

The catalog listing below represents a simple table structure:


<?xml version="1.0"?>

<catalog>

<book>

<author>Simon St. Laurent</author>

<title>XML Elements of Style</title>

<pubyear>2000</pubyear>

<publisher>McGraw-Hill</publisher>

<isbn>0-07-212220-X</isbn>

<price>$29.99</price>

</book>

<book>

<author>Elliotte Rusty Harold</author>

<title>XML Bible</title>

<pubyear>1999</pubyear>

<publisher>IDG Books</publisher>

<isbn>0764532367</isbn>

<price>$49.99</price>

</book>

<book>

<author>Robert Eckstein</author>

<title>XML Pocket Reference</title>

<pubyear>1999</pubyear>

<publisher>O'Reilly and Associates</publisher>

<isbn>1-56592-709-5</isbn>

<price>$8.95</price>

</book>

</catalog>

We'll turn it into a table with a 3-line style sheet:


catalog {display:table;}

book {display:table-row;}

book *{display:table-cell; padding:5px;}



The asterisk in the last line indicates that all the child elements of book element instances should be treated as table cells -- there's no need to list all the possible child elements. (This is especially useful when you have different behaviors for elements both inside and outside of tables.) The results of this tiny style sheet aren't exactly exquisite, but they make it much easier to work with the information.


Figure 3 - Simple book catalog

Once you've built the foundation structures for your document flow, all of CSS's tools for formatting content -- margins, fonts, borders, generated content, and more -- are available. They work the same way with XML that they worked with HTML, though without the background information that HTML always provided. Every XML element -- except those explicitly placed in the HTML namespace -- is a blank slate.

If document flows don't meet your needs, Mozilla also offers CSS2 positioning. By combining the position property with values for the top, left, height, and width properties, you can lay out your document content on a pixel-by-pixel basis. The float property lets you specify how the rest of the content should flow around these and other blocks in the text. Mozilla M14 seems to lock up its scroll bars as soon as position:fixed is used, but its support for the rest of positioning appears to work smoothly.

Adding Basic Links and Images

While Mozilla doesn't support the entire XLink vocabulary -- a reasonable decision given its uncertain status as a working draft -- it supports enough of it to let users create simple links in their documents. The syntax dates back to the March 1998 draft, now thoroughly outdated. It doesn't support enough of XLink to move beyond the HTML IMG element, so adding elements to documents requires using the HTML namespace. We'll take a look at Mozilla's support for XLink by adding some pictures and links to the book example.

The new document looks much like its predecessor, but has one extra column containing linking and image information, along with some additional linking attributes on the title elements containing the book titles (for reasons of space, I've abbreviated the image URLs; see books2.xml for the full file):


<?xml version="1.0"?>

<?xml-stylesheet type="text/css" href="books2.css"?>

<catalog xmlns:html="http://www.w3.org/TR/REC-html40" >

<book>

<cover xml:link="simple" show="replace" 

href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">

<html:img src="http://images.amazon.com/..." />

</cover>

<author>Simon St. Laurent</author>

<title xml:link="simple" show="replace"  

href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">



XML Elements of Style



</title>

<pubyear>2000</pubyear>

<publisher>McGraw-Hill</publisher>

<isbn>0-07-212220-X</isbn>

<price>$29.99</price>

</book>

<book>

<cover xml:link="simple" show="replace"  

href="http://www.amazon.com/exec/obidos/ISBN=0764532367/">

<html:img src="http://images.amazon.com/..." />

</cover>

<author>Elliotte Rusty Harold</author>

<title xml:link="simple" show="replace" 

href="http://www.amazon.com/exec/obidos/ISBN=0764532367/">



XML Bible



</title>

<pubyear>1999</pubyear>

<publisher>IDG Books</publisher>

<isbn>0764532367</isbn>

<price>$49.99</price>

</book>

<book>

<cover xml:link="simple" show="replace"  

href="http://www.amazon.com/exec/obidos/ISBN=1565927095/">

<html:img src="http://images.amazon.com/..." />

</cover>

<author>Robert Eckstein</author>

<title xml:link="simple" show="replace"  

href="http://www.amazon.com/exec/obidos/ISBN=1565927095/">



XML Pocket Reference



</title>

<pubyear>1999</pubyear>

<publisher>O'Reilly and Associates</publisher>

<isbn>1-56592-709-5</isbn>

<price>$8.95</price>

</book>

</catalog>

The root catalog element now defines a namespace for HTML 4.0:




<catalog xmlns:html="http://www.w3.org/TR/REC-html40" >



A new cover element contains both a link to a place where the book can be purchased, and an HTML img element that lets us reference and display the image:

<cover xml:link="simple" show="replace" 

href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">

<html:img src="http://images.amazon.com/..." />

</cover>

The xml:link attribute identifies the cover element as a simple XLink, the show attribute indicates that the referenced document should replace the current page in the browser window, and the href identifies the document targeted by the link. In the latest draft of XLink, the XLink namespace would have to be declared, xlink:type would replace xml:link, and xlink:show and xlink:href would be used in place of show and href, respectively.

The html:img element behaves exactly like an img element in HTML, with a src attribute identifying the location of the image to be displayed. All the other features of the HTML img element, like the alt, height, and width attributes, are still available and work as they do within HTML.

The title element carries the same XLink information attributes as the cover element. While Mozilla will format HTML a elements as links, it doesn't presently do so for information linked using XLink. We'll update the style sheet to highlight the title in a rather traditional way:


catalog {display:table; }

book {display:table-row; }

book *{display:table-cell; padding:5px;}

title {color:blue; text-decoration:underline;}

The result is a table of books with book jackets displayed and links to a sales process established:


Figure 4 - Adding linked images into the catalog

Next Steps

We haven't built any masterpieces here, but the foundation is sound. Mozilla is now capable of displaying XML on par with HTML, and the extra flexibility that style sheets provide make it a genuine contender as a tool for creating web documents (or will someday when more users have the software!). There's a lot more CSS to explore in Mozilla, as well as tools that connect CSS to the DOM.

While Mozilla isn't yet polished, and some aspects of its development (XLink in particular) could use an update, it's a solid enough platform to be well worth exploring. By providing a tool for reading and exploring XML documents that works on a wide variety of platforms, and has an open source license to boot, the Mozilla project has definitely made a contribution to the XML community.