The Digital Talking Book

October 16, 2002

Ken Pittman

The talking book is not new. We've enjoyed talking books for nearly as long as we've had recording devices. The question being asked today is whether the use of XML can permit the Digital Talking Book (DTB) to provide new ways of managing information, particularly for the seeing and seeing-impaired communities? This article will look at the technical elements of DTBs: the standards that facilitate their use, effective information management, copyright and security issues, and a look at future applications.

The core application of DTB is the conversion of a book into speech. In predigital applications, a narrator reads the book and the publisher reproduces the narration with a commercial replay device. Analog output provides users with basic functionality: play, fast-forward, stop, and rewind. Any charts, graphs, and visual aides used in the original book are lost in analog, linear narrations. That's especially problematic for scientific texts. Despite significant efforts by Recording For the Blind & Dyslexic and the National Library Service for the Blind and Physically Handicapped (NLS), a division of the U.S. Library of Congress, less than 10% of published books ever make it into an accessible format.

RFBD and NLS have continued their leadership by embracing the need for standards in digital storage and replay of books. The DAISY Consortium was formed in 1996 by talking-book libraries to lead the worldwide transition from analog to DTBs. DAISY, working with other standards bodies, developed initial standards for the use of digital books and pioneered efforts needed to move from analog replay to digital replay. Moving well beyond basic functionality, the DAISY standards introduced page numbering, class attributes (for example, page front, page normal, page special), unique IDs for each element to support W3C's SMIL (Synchronized Multimedia Integration Language), structural tags, Dublin Core and Navigational Control Center metadata, sidebars, notes, and other text navigation aides.

The advent of digital technology allows the visually-impaired community to address two major challenges to gaining full access to the wealth of printed material: narration and copyright.

Overcoming the constraint of narration

What narration could not address due to limited resources, the Open eBook Forum addressed by endorsing Text-to-Speech (TTS) technology, allowing more books to be translated into audible formats. Narration requires a real person to read the text, and the analog file was brought to market using the SMIL standard. TTS allows the software to process the electronic text into computer speech. In 1999 the eBook Publication Structure 1.0 became a standard, featuring completely accessible XML. The goal of the eBook community is to offer "completely accessible books to users suitable to their individual needs or requirements" -- books richly marked up in XML, sufficient to support multiple reading systems: print, audio, Braille, large print, indexes, bibliographies. XML also facilitates the rich functionality offered by newer player platforms. For example, Visuaide and Labyrinten offer players which do synchronized pause, individual word spellings, and search.

Are there disadvantages in moving from the human narrator's voice to the stilted, mechanical voice of TTS? Yes, but the increased functionality and expanded library of material overcomes them. "Imagine a man who is blind sitting at a PC and listening to [a] Shakespeare passage with an eBook reader using TTS. The mechanical tones are far less evocative than those of the professional narrator, but the listener who is blind can see past the quirks of the robotic voice," says George Kerscher of the DAISY Consortium and Jim Fruchterman of the Benetech Initiative.

Meeting the copyright challenge

The second challenge is copyright. Publishers enjoy a growing demand for audio versions of printed material, and often secure exclusive agreements for the entire audio version of copyrighted text. Traditionally publishers have secured these agreements for audio publication through narration-based services. The task of the Open eBook Forum and other forums supporting the DTB is to persuade publishers to consider the wider set of alternatives beyond the first generation solution simple narration offers. The OEB Forum tries to take advantage of the Chaffee Amendment to the US copyright code, which permits the non-standard use of books by a non-profit for production of Braille or audible output. Non-profits which seek to produce documents in non-standard form like Braille or DTBs, and excepting plays and music, enjoy a copyright waiver under the amendment.

Although the Chaffee Amendment opens up a new opportunities for publishers, the expanded functionality offered in the latest DTB standards is equally appealing to sighted customers. Publishers are calling for digital copyright protection and compensation methods for proper, attributed use. Thus the need for attribution and appropriate compensation becomes even more acute. With enhanced XML markup in ebooks, sounds clips of books might be used on the Web to promote books sales and upcoming movies.

Several digital rights management (DRM) initiatives address the needs presented by repurposing of content and required attribution. DRM techniques allow a copyrighted text or ebook to pass by the cash register prior to allowing the viewer full access, audible or legible. The DRM tools must be preset to release the authorized use of copyrighted text by qualified representatives of the non-seeing community and "turn on" the selected Braille or audio publishing capabilities.

Along with the NLS, the Library of Congress, with the U.S. Copyright Office, supports an initiative called the Digital Object Architecture. That project developed a framework for distributed digital object services. To further support accurate, robust attribution needs of publishers, the International DOI Foundation (IDF) is developing a Digital Object Identifier (DOI) System, a standard method for identifying digital content for the publishing industry. It provides a framework for managing intellectual content, for linking customers with content suppliers, for facilitating electronic payment exchange, and enabling automated copyright management for all types of media.

The Standards

In 2001 the DAISY consortium finalized version 2.02 of their specification. DAISY 2.02 is based on the W3C's XHTML 1.0 and SMIL 1.0 specifications. Using this framework a talking book format is presented that enables navigation within sequential and hierarchical structure consisting of (marked-up) text synchronized with audio.

DAISY 2.02's successor, DAISY 3, has now been published as a NISO standard, Z39.86. In March of 2002, the National Information Standards Organization approved the ANSI/NISO Z39.86 standard for DTB applications. Z39.86 is a set of Open eBook compatible DTDs expanding the capability of the standard to support a wider range of functionality. It contains the following elements.

  • Package Identity - a unique identifier for the OEB publication as a whole

  • Metadata - Publication metadata (title, author, publisher, etc.)

  • Manifest - A list of files (documents, images, style sheets, etc.) that make up the publication. The manifest also includes fallback declarations for files of types not supported by this specification.

  • Spine - An arrangement of documents providing a linear reading order

  • Tours - A set of alternate reading sequences through the publication, such as selective views for various reading purposes, reader expertise levels, etc.

  • Guide - A set of references to fundamental structural features of the publication, such as table of contents, foreword, bibliography, etc.

The informal outline of the package is as follows:

<?xml version='1.0'?>

<!DOCTYPE package 

   PUBLIC "+//ISBN x-xxxxxxx-x-x//DTD OEB 1.0.1 Package //EN" 









A DTB conforming to this standard must include exactly one Package File. This must be a valid XML document conforming to the OEBF Publication structure 1.0.1 package DTD (oebpkg101.dtd). The full specification, DTD, and entity reference set for the OEBF package file are available for download from the OEBF site. The Package File must be named with the extension ".opf". If a DTB spans multiple media units, the same package file must be present on each media unit.

The full list of the different DTDs within the standard includes:

  • Packaging - required

  • Text - required

  • Images - optional

  • SMIL - required

  • NCX - (navigation controls) required

  • Resource file - optional

  • XSL - (style sheet for text only)

  • Distribution - required when output is across multiple elements e.g. multiple CDs

The full DTD is available in hardcopy from NISO for $99 US, or for download as a PDF.

The Z39.86 DTDs provide a very robust solution for processing text to speech, narrated or synthetic. The user community still prefers human to computer-generated speech, although TTS solution are becoming smoother and more natural. With the guidance of this standard, a wide range of features are offered the reader not available in previous reading systems:

  • rapid, flexible navigation;

  • bookmarking and highlighting;

  • keyword searching;

  • spelling of words on demand;

  • user control over the presentation of selected items (e.g., footnotes, page numbers, etc.).

Readers facing visual and physical disabilities to information are now given a more flexible and efficient reading system.

Another Push to the Application

The US Congress is considering the Instructional Material Accessibility Act. Still in committee, this bill would require material used in educational institutions to provide or produce an electronic output of the text compatible with a given DTB standard to facilitate a wider use of text book materials within the visually challenged community. Publishers and educational institutions are supporting the bill.

Whether the audio portions of books prove to be a marketable opportunity for publishers is yet to be determined. What is clear is that the visually-impaired community is receiving support from the highest institutions to produce much more robust reading solutions for both educational and leisure reading.