Introduction to JATS (Journal Article Tag Suite)

October 12, 2018

Debbie Lapeyre

Deborah A. Lapeyre, one of the developers of JATS and a member of the JATS secretariat, introduces us to the ANSI/NISO standard for the XML interchange of journal articles.

'JATS', formally known as ANSI/NISO Z39.96-2015 JATS: Journal Article Tag Suite, is an international standard XML tag set for journal articles. JATS is an XML vocabulary (similar in purpose to other document-based XML vocabularies such as DocBook or TEI) designed to model current journal articles. JATS is a named collection of XML elements and attributes that can be used to mark the structure and semantics of a single journal article. Thus JATS does not model issues of journals, books, language corpora, patents, legislation, standards, or other document types. Originally, JATS was used for STEM (Scientific, Technical, Engineering and Medical) articles, but now journals in the humanities, sociology, economics, and the soft sciences also use JATS XML markup.

JATS has three separate models, intended for distinct audiences:

  • Archiving and Interchange Model (loose, for libraries and archives to ingest XML into their repositories)
  • Journal Article Publishing Model (tighter, for publication production, typically used by publishers, hosting platforms, and portals)
  • Article Authoring Model (designed for the article authors)

Original Purpose of JATS

JATS was originally constructed for interchanging journal articles, providing interoperability of article content and article metadata among publishers and archives. In the early days, it was expected that publishers, hosters, portals, and archives would use their own XML tag set internally, and transform into JATS XML when they wanted to:

  • exchange information with each other,
  • put information into a combined repository,
  • sell/display items on the same hosting platform, and/or
  • share the development of tools and resources for common use.

The World Publishes Journal Articles in JATS

But since JATS came out in February 2003, many publishers have found JATS useful for the production and Quality Assurance testing of articles and preprints. While JATS is still widely used to build large journal repositories and for organization-to-organization interchange, publishers have also been producing their new content in JATS. The largest publishers still use their own tag sets, but small and medium-sized publishers are encoding their new journals in JATS and converting their backfiles from PDF or a proprietary tag set into JATS. Public archives such as libraries and scholarly portals find it convenient to receive article data from many sources in a single format. Private and commercial archives may require JATS for ingest, or may transform whatever format they receive into JATS for their website/repository/app. Numerous web-hosting and service vendors require or support JATS. Free, public transforms make CrossRef deposit files from JATS. Whole communities have agreed on a data format for journal content.

JATS is in use in more than 25 countries world-wide, including: Australia, Belgium, Brazil, Bulgaria, Canada, Chile, Egypt, Finland, France, Germany, Italy, Japan, Norway, Russia, South Korea, Sweden, Switzerland, United Arab Emirates, the United Kingdom, and the United States. Most middle-sized/small publishers in North America and Europe use JATS, with many Asian countries and South American countries produce at least some journals in JATS. All of the really large publishers (who use their own journal models internally) can (and do) make JATS for interchange and deposit. Many Public and private archives accept (or require) JATS, including: PubMed Central (both US and UK), British National Library, Australian National Library, US Library of Congress, Portico/ITHAKA/JSTOR, and many many more.

Conversion vendors worldwide have developed processes for and gained significant experience in converting into and out of JATS. Using a shared tag set means that conversion vendors do not need to learn/customize a journal tag set for each publisher. Tools have been developed to make JATS out of many formats, including Microsoft Word and LaTeX, and to make quality PDF, HTML (HTML, XHTML, HTML5), various accessible formats, and eBooks out of JATS.

Structure of a JATS Article

By design, JATS models what journal publishers are already doing. When JATS was designed, hundreds of journals were examined for structure and metadata and over 45 existing XML and SGML journal models were scrutinized. The designers tried for the 80/20 case; that is, if 80% of the journals marked up the data in a similar fashion, JATS chose those elements also, even if the JATS designers thought they could imagine a better way to tag the material. For the 20% not covered explicitly, tagging escape hatches were added to JATS: a name-value container element for the metadata, a named-content inline element to capture publisher-specific semantic tagging, and a styled-content element to record a non-semantic inline look-and-feel distinctions. The intent of publisher-specific semantic structures (such as Data Availability Statements) can be captured with information classing attributes.

JATS tries not to lead or coerce publishers, but to capture common journal practice. JATS preserves current text order (reading sequence) as much as possible and does not typically define or set 'Best Practices'.

The idea is for it to be as straight-forward as possible to encode anyone's journal articles in JATS. To make this easier, JATS is an old fashioned document model, with metadata concerning the article (and the journal, if appropriate) at the beginning and the body and back matter of the article following the metadata. This model (shown below in outline form) fits current journal production.

  • Front Matter (<front>)
    • journal-level metadata
      (journal title and identifiers)
    • article-level metadata
      (article title, author(s), identifiers like a DOI)
  • Body Matter (<body>)
    the narrative text of the article, including, for example:
    • paragraphs
    • sections
    • figures and graphics
    • tables (both XHTML and CALS)
    • equations and quotations
  • Back Matter (<back>)
    • appendices
    • bibliographic reference lists with deeply detailed
      (but completely optional) citation metadata

Metadata for Searching, Mining, Extraction

In addition to marking up the structure of an article, JATS can (always optionally) provide rich metadata needed for semantic interchange, context-sensitive search, and data mining. The extensive metadata describable in JATS travels with the XML article, not locked away in a separate file or database. For example, JATS can include many machine-processable external identifiers such as:

  • article and article component identifiers such as DOIs and arXiv numbers,
  • author identifiers such as ORCIDs,
  • institution identifiers (e.g., Ringgold, ISNI), and
  • funder identifiers.

Article-level metadata for article citation and credit for the authors can includes:

  • article title (with alternative languages if desired);
  • article identifiers (DOI, publisher identifiers, etc.);
  • copyrights and permissions;
  • multiple abstracts (to capture sectional abstracts, graphical abstracts, stereochemical abstracts, twitter-length abstracts, RSS/Atom abstracts, and ordinary paragraph abstracts),
  • finding aids such keywords, terms, and subject classifications (which may be given in multiple languages and linked to ontologies/taxonomies), and
  • detailed descriptions of the funding behind the research on which the article reports, including:
    • both monetary and non-monetary funding
    • award and grant identifiers
    • funding sources and principals
  • links to companion articles or resources (different language versions, errata, updates, related books or web pages)

Contributors to the writing and research behind an article are not the only the authors and editors typically cited for today's articles. Contributors may also be photographers, study designers, curators, genome sequencers, specimen collectors, visualization artists, and other named roles. The material JATS can record concerning a contributor may include:

  • unique external identifiers
  • the ability to record both initials and full names
  • the ability to have only a given name, without a surname, as is the custom in some southeast Asian countries
  • alternative names for one individual (both Japanese characters and a romanized version, for example)
  • description of the role or roles played by the person (possibly linked to CRediT taxonomy terms)
  • multiple affiliations (with alternative names for multiple languages)

Detailed bibliographic references for an article allow publication tracking and help establish scientific relevance through citation. The named components of a reference allow software to check references with CrossRef, figshare or other DOI repository to improve reference accuracy. JATS can support nearly any type and style of reference tagging including data citation (with access times). For citing journal articles, specific elements include contributor names, article title, journal title and issue, first and last page numbers, day-month-year of publication, DOI, and more. Similarly, JATS has named elements sufficient to cite books, patents, conference papers, standards, and other works.

Advantages of JATS for Journal Articles

Why might you consider JATS if you produce journal articles?

Declarative: JATS markup is structural or semantic and declarative, not presentational or behavioral. This makes articles easier to process and helps ensure longer-lived data.

Designed for Articles: JATS is an XML model that fits the way journals (and preprints) are published today.

Tag Set is Documented: Extensive Tag Libraries with explanations and examples for both element and attribute usage are available online, as are many Best Practice recommendations.

The Price is Right: Tag sets (in DTD, XSD, and RNG form), Tag Library documentation, tagged examples, and some tools for QA and output production are available free from:

  • US National Library of Medicine (NLM)
  • github (JATS4R, eLife, et al.)

Highly and Easily Customizable: JATS was designed to be both extended and subsetted very easily. The built-in extension mechanisms are documented in the Tag Libraries.

Not a Static Standard: JATS changes as publishers and other users request new features or modifications. Your comments can be submitted to www.niso.org/standards-committees/jats

In short, JATS is not new, sexy, or exciting, but it is immensely useful in the journal world. In the words of Jeff Beck (of PubMed Central)

'JATS is no longer one of the cool kids;
it's just what you do if you have journal articles.'