XML.com

Introduction to BITS (Book Interchange Tag Suite)

January 18, 2019

Debbie Lapeyre

Deborah A. Lapeyre, one of the developers of BITS and a member of both the BITS committee and the JATS Standing Committee, introduces us to the BITS tag set for the archiving and interchange of technical books.

The Book Interchange Tag Suite (BITS) is an XML document model for STEM books that is based on JATS (the Journal Article Tag Suite ANSI/NISO Z39-96-2015). BITS is a named collection of XML elements and attributes for describing the structural and semantic content of books and book components, as well as a packaging element for interchange of book parts. BITS provides a robust book model that is compatible with JATS, making it easy for publishers of both journals and books to publish them using the same system.

Why Yet Another Book Model?

Why add another book model to a world that already has DITA, TEI, DocBook, and dozens of other public and proprietary book models? The spring-board for developing BITS was the observation that a JATS-based XML book model, or a set of related book models, would be useful to a wide variety of publishers of professional and scholarly books, especially but not exclusively to publishers who are already using one of the JATS journal article models and therefore looking for a compatible model for their books. Just as the wide support of the JATS has enabled many journal publishers to move to XML, a well-supported XML book model (appropriate to book publishing requirements and compatible with current citation management tools) would enable book publishers to move to XML and through XML to electronic publication and archiving. By developing book models based on the existing JATS Tag Suite, we hoped to enable publishers to include their books in the systems that already create, manage, publish, and archive their journal articles and to build on the investment they, as well as their suppliers and vendors, have made in learning and developing around JATS.

Purpose and Scope

The goal for BITS is to provide an XML tag set to support interchange, archiving, format-conversion, and publishing for scientific, reference, higher education, technical, and medical books. The BITS book models are not intended to describe trade books, cook books, grade-school text books, legal works, historical editions, or any of the wide variety of books outside the current scientific, technical, engineering, and medical realms in which JATS is used for journals. Although BITS is currently supported by the National Library of Medicine, this BITS book model is usable beyond life sciences publishing, just as the NISO JATS journal article models are useful in physics, social sciences, linguistics, and poetry.

BITS supports marking up the structural and semantic content of books so that the material can be reused, repurposed, and made more discoverable. This purpose implies, as the similar purpose does in JATS, that the ability to reproduce a particular book format or look-and-feel is not a BITS goal.

JATS as Design Basis

An explicit goal for BITS was the creation of models that would enable the construction of books comprised of articles. The intent was to enable the bodies of journal articles to pass nearly unchanged into book parts, with changes to only the outer wrapping element and certain book-specific metadata reflecting the move from an issue of a journal to the chapter of a book. Therefore, the BITS book model is based on the JATS Journal Archiving and Interchange Tag Set (known as 'Green' from the colors of the Tag Library pages), with book metadata in BITS replacing the journal and issue metadata from the JATS Archiving. The BITS Book Interchange Tag Set is a superset customization of the JATS Journal Archiving and Interchange Tag Set, with added material to describe STEM books, book components such as chapters, and information concerning the inclusion of books and book components in book series. The XInclude mechanism allows books to be managed in component pieces.

BITS was constructed using the JATS modules, the JATS customization mechanism, and adding additional modules to define components that are specific to books. This relationship between JATS and BITS has been quite strictly defined: 'The models should be as similar as possible and only as different as necessary'. Therefore if JATS has a named structure that also occurs in books, the JATS name (and, to the extent possible, the JATS content model and attributes) were used for BITS.

BITS describes both the metadata and the narrative content of a book, both the metadata and narrative content for book components, and collection-level metadata for book sets and book series when a book part is associated with one or more such collections. The book metadata for the book is held in the element <book-meta>, which is not named 'front' as is the corresponding element holding the article metadata for a JATS journal article. Book metadata is unlike that for journal articles, making this one of the real changes from JATS.

Naming Parts of a Book

BITS is entirely agnostic concerning the many terms the publishing industry uses to name components of books, terms such as, for example, chapter, part, unit, module, lesson, segment, division, section, etc. BITS divides books into book parts (element <book-part>) and leaves it up to the publisher or other BITS user to call them chapters or units or modules or anything else.

There are, however, a few named book parts in BITS, largely in the narrative front matter and the back matter of a book. Because BITS, like its JATS parent, does not lead publishers but tries to consolidate current publishing practice, the following named book parts (that were requested explicitly by book publishers) have been built into BITS: Dedication, Foreword, Preface, Table of Contents (a structural Table of Contents that can be edited), and Index (a structural index that can be edited). The Index and Table of Contents structures are modeled with specific semantic elements. By contrast, the named front matter parts, such as 'Preface', are modeled as generic structures. This means that a BITS user can choose to use the named book parts or just use the element <book-part> to tag all the parts of the book.

Publishers also give many names to collections of books and/or book components, such as book sets, book series, monograph series, and the like. BITS is entirely agnostic concerning such collective nouns and merely collects the metadata naming such a grouping.

Structure of a BITS Book

There are two top-level elements in the BITS model:

  • the Book element (<book>), to contain an entire document such as a textbook or a monograph; and

  • the Book Part Wrapper element (<book-part-wrapper>), to contain a book part such as a 'chapter' or 'module' that needs to be handled as a discrete unit.

Just as the XML of a JATS journal article may contain only the metadata for an article (the narrative text of the body and back matter are optional), a BITS book element may contain only the metadata for a book (the narrative text of the book, the back matter, and any book part are optional). This allows publishers and archives to use both JATS and BITS for exchange of metadata, even when not preserving the textual content of a document in XML.

When both the metadata and the text of a book are to be tagged in XML, a BITS book may be composed of the following components:

  • Collection Metadata (optional, repeatable). Bibliographic metadata describing a book set or book series to which this book or book part belongs. A book or book part may be part of many collections.

  • Book Metadata (optional). The book metadata element (<book-meta>) contains the publishing metadata for the book, for example, the title of the book, the date of publication, the publisher's name and location, a copyright statement, etc. This is not the textual front matter that appears at the beginning of a book, rather this is bibliographic information about the book.

  • Front Matter (optional). If present, the front matter element (<front-matter>) contains the textual front material for a book, such as a Dedication, Foreword, or Preface. (Note: This is a different nomenclature than in JATS, where 'front matter' refers to the metadata of the journal article.)

  • Body of the Book (optional). If present, the body of the book element (<book-body>) contains the narrative of the work, the main textual and graphic content of the book. The body of a book is composed of book parts (<book-part >), which may be called parts, sections, chapters, modules, lessons, or whatever divisions a publisher has named.

    Book parts contain paragraphs, sections, tables, figures, quotations, and all the textual material and elements that make up the narrative and graphics for a book. Book parts are recursive, so they may contain other book parts. For example, 'Part 3' of a book could contain several 'Chapter's, each of which could have a foreword, the body of the chapter, one or more appendices, and a reference list.

  • Back Matter for the Book (optional). If present, the book back matter element (<book-back>) contains information that is ancillary to the main text, such as a glossary, appendix, or list of cited references. The back matter may also contain floating material (<floats-group>), a container element for all the 'floating' objects (such as tables, figures, and sidebars) in a book. The back matter of book parts (<back>) and the <book-part-wrapper> element may also contain their own, separate Floating Material elements (<floats-group>).

Book Part Wrappers

The second top-level element in BITS is the book part wrapper (<book-part-wrapper>), which contains a single book part to be interchanged, along with the metadata that describes the book part and collection metadata that describes any grouping (such as a virtual book) of which the book part is a member. A book part may be associated with many collections.

Advantages of BITS for Books

Why might you consider BITS to tag your books?

Declarative: BITS markup is structural or semantic and declarative, not presentational or behavioral. This makes books and chapters easier to process and helps ensure longer-lived data.

Based on JATS: JATS is an XML model that fits the way journal articles (and preprints) are published today. BITS uses all the JATS body structures and much of the article metadata. If you already have expertise in JATS, getting into BITS is easy. If your display system was built for JATS articles, you can add BITS books relatively easily. If your search system was built for JATS articles, it will search BITS books with a small amount of adjustment.

Tag Set is Documented: Extensive Tag Libraries with explanations and examples for both element and attribute usage are available online.

The Price is Right: Tag sets (in DTD, XSD, and RNG form), Tag Library documentation, tagged examples, and some tools for QA and output production are available free from: US National Library of Medicine (NLM) jats.nlm.nih.gov/extensions/bits/

Highly and Easily Customizable: BITS was designed to be both extended and subsetted very easily. The built-in extension mechanisms are documented in the Tag Libraries.

In short, while BITS is not ideal for language corpora, scholarly editions, legal books, or grade-school textbooks, it is a useful addition to the JATS family. If you have books that are really collections of articles, if you publish your journals in JATS, or if you have well-structured STEM books and reference works, consider BITS for your XML model.

This article was based on the General Introduction to the BITS Tag Library on the National Library of Medicine site (jats.nlm.nih.gov/extensions/bits/).
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.