Menu

Opening the E-Book

October 18, 2000

Didier Martin

It's 7 am. As usual, the bus is full of people. Some seem to be sleeping still, others are hidden behind their newspapers and really don't care if, every day, the forests are depleted of resources accumulated throughout centuries.

Then I notice the gentleman sitting next to me as he extracts a small palmtop computer from his pockets and taps it twice. I take a subtle look at the screen, and, to my surprise, I notice that he's reading a book, but not one made of paper but of electronic bits. He's reading an e-book.

No More Dead Trees

On September 16th 1999, the Open e-Book authoring Group published a document named "Open e-Book Publication Structure 1.0". This document can be downloaded at http://www.openebook.com. (It is also available as an e-book.) However, although the content specification is a standard, each reader only accepts its own file format -- an OEB (Open E-Book) package needs to be processed by a publishing tool in order to be consumed by a particular reader.

Some readers, such as Microsoft's, run on multiple platforms, from desktops to palmtops. The Microsoft reader is freely available. Microsoft says they will be publishing the Microsoft Reader Content SDK. This is necessary to package e-books as "lit" files, understood by Microsoft's reader.

E-books can also be read on PalmOS computers like the Palm Pilot or Handspring Visor. Another free e-book reader is available from Mobipocket. You can also find an e-book publishing kit, which helps you create an OPF file, the e-book package file. It's the glue that ties all the elements in the book together. To my knowledge, Mobipocket is the only kit publicly available that can process an OEB 1.0 package document.

Naturally, the OEB package document is XML. We'll examine its internal structure below.

The OEB Specifications Are Based on XML

An OEB document should be an XML 1.0 document, conforming to the following requirements:

  • well-formed XML document,
  • conforms fully to the OEB document DTD,
  • conforms to XHTML 1.0 when that specification is issued, and,
  • viewable in version 4 HTML browsers.

The EOB specification defines two XML DTDs: the package DTD and the basic OEB document DTD. Basic OEB documents are used to encode the book's contents. In the current 1.0 specification, although the OEB elements are not given a namespace, the dc: namespace prefix is required for all Dublin Core metadata. A minimal e-book involves at least two documents, a package document and a content document.

The Package Document

The OEB 1.0 specification recommends that the package file use the extension "opf". Package files have text/xml as their MIME media type since they are XML 1.0 compliant documents. The whole package (i.e. the package document and the content documents) can be formatted in different ways. For instance, Mobipocket books are packaged as ".prc" files for PalmOS, and Microsoft's e-books are packaged as ".lit" files.

The package file, which specifies the OEB documents, images, and other objects, and their relationships, is structured thus:


<?xml version="1.0"?> 

<!DOCTYPE package 

     PUBLIC "+//ISBN 0-9673008-2-9//DTD OEB 1.0 Package//EN" 

"http://openebook.org/dtds/oeb-1.0/oebpkg1.dtd"> 

<package unique identifier="XYZ"> 

     metadata 

     manifest 

     spine 

     guide 

</package>

The Package

Inside the <package> element, all metadata about the OEB publication is enclosed in an <metadata> element, as in


<metadata xmlns:dc="http://purl.org/dc/elements/1.0/">

<dc:Title>The Call of the Wild</dc:Title>

<dc:Creator>Jack London</dc:Creator>

<dc:Date>5/7/00</dc:Date>

<dc:Identifier id="XYZ" scheme="ISBN">123456789</dc:Identifier>

</metadata>

As you can see, the Dublin Core elements are identified by the dc: namespace prefix, and we declared the namespace associated URI in the <metadata> element. The OEB publication unique identifier is defined in the metadata as an ISBN classification type. This is done through the <dc:Identifier> element. The usual Dublin Core metadata elements can be included in the <metadata> element:

  • dc:Title
  • dc:Creator
  • dc:Subject
  • dc:Description
  • dc:Publisher
  • dc:Contributor
  • dc:Date
  • dc:Type
  • dc:Format
  • dc:Identifier
  • dc:Source
  • dc:Language
  • dc:Relation
  • dc:Coverage
  • dc:Rights

The Manifest

The <manifest> element provides a list of all the files that are part of the e-book. It contains one or more <item> elements. An item refers to a document, an image, a stylesheet, or any other object that forms part of the publication. An item's attributes are a unique identifier (e.g. id="intro"); a hyperlink reference (e.g. href="introduction.html"); and a MIME media type (e.g. media-type="text/x-oeb1-document").

Here's an example.


<manifest>

<item id="cover" href="cover.htm" media-type="text/html"/>

<item id="toc" href="TOC_call.htm" media-type="text/html"/>

<item id="bio" href="bio.htm" media-type="text/oeb1 document"/>

<item id="call" href="callofwild.htm" media-type="text/html"/>

</manifest>

It may be the case that some reader cannot read or interpret an item. If so, a fallback attribute can be added to an <item>. The fallback points to another item.


<manifest>

<item id="cover" href="smallCover.svg"

  media-type="text/svg" fallback="coverHTML"/>

<item id="coverHTML" href="cover.htm" media-type="text/html"/>

<item id="toc" href="TOC_call.htm" media-type="text/html"/>

<item id="bio" href="bio.htm" media-type="text/oeb1-document"/>

<item id="call" href="callofwild.htm" media-type="text/html"/>

</manifest>

In the example above, the cover page of the book is by default an SVG document. However, if the e-book reader cannot interpret SVG documents, it falls back to an HTML cover page.

The Spine

The <spine> defines the linear reading order of the publication. In the example below, the e-book reader will first present the cover, then the table of contents, then the "The Call of the Wild" document, and finally the author's biography.


<spine>

<itemref idref="cover"/>

<itemref idref="toc"/>

<itemref idref="call"/>

<itemref idref="bio"/>

</spine>

Tours

Navigation can be specified with the <tours> element. As its name indicates, a tour guide can be specified to assemble points of interest into a tour. Thus a publication may offer the reader different ways of navigating it.


<tours>

<tour id="tour1" title="Get to the point">

<site title="introduction" href="callofwild.htm#r45"/>

<site title="conclusion" href="callofwild.htm#r90"/>

</tour>

</tours>

The Guide

A <guide> element defines the structural components of the publication. These components refer to tables of content, cover, illustrations, biography, etc. In our sample publication there are two structural elements: a table of content and a biography.


<guide>

<reference type="toc" title="Table Of Contents"

    href="TOC_call.htm"/>

<reference type="bio" title="Biography"

    href="bio.htm"/>

</guide>

The Book's Content

To create the content pages for a book, just use an HTML editor. Then, process it with Tidy, which will transform the HTML document into an XML document. You can download Tidy from Dave Raggett's W3C page. Don't forget that the content has to be structured using the HTML vocabulary but should be formatted as an XML document. For instance, the document below is valid content.


<?xml version="1.0"?>

<!DOCTYPE html 

    PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0 Document//EN"

"http://openebook.org/dtds/oeb-1.0/oebdoc1.dtd">

<html>

<head>

<title>The simplest e-book in the world</title>

</head>

<body>

<p>Hello World!</p>

</body>

</html>

Most of the useful features of HTML 4 can be used in e-book content: CSS stylesheets, <div>, etc. An e-book reader may even execute scripts. This implies that e-books can be interactive and offer a richer experience to the user. Most of the HTML 4 vocabulary is re-used in e-books but not all of it. It is better to check your document with a validating XML parser (simply include the DOCTYPE declaration and get the document validated against the DTD). The e-book specifications are located at the Open E-Book site.

This Week's Experiment

If you have a palmtop computer running PalmOS, EPOC or WinCE, there's an experiment you may like to try.

a) Download the Mobipocket publisher kit ( Mobipocket web site)
b) Download "Tidy" to convert your HTML files to the XML format (Tidy web site)
c) Pick one of your HTML documents and convert it to XML with Tidy
d) Use the Mobipocket OEB package editor and file formatter to create your e-book.
e) Sync it with you palmtop computer, and read your document.

Read an e-book. Save a tree. You'll be glad you did.