Codifying Medical Records in XML

October 2, 1997

Thomas L. Lincoln, M.D.

Codifying Medical Records
in XML

Philosophy and Engineering

Thomas L. Lincoln, MD


The following paper was given as a talk at the "XML Mixer" in La Jolla, California in late July '97, before a combined audience of clinicians, computing professionals, and vendors of document processing software. What brought the group together was an ongoing effort to introduce markup technology into the processing of healthcare information in an ISO standard manner, using SGML (Standard Generalized Markup Language) and SGML's strict subset, XML (eXtensible Markup Language). Other speakers spoke more specifically to processing topics, work flow, or business issues in the use of information systems in medicine, but the emphasis here is on some long perceived, but often overlooked problems in the semantics of communication. Both the general and the specific are important ingredients in this area, which indirectly indicates why the document format offers the appropriate middle ground between free text and excessively rigid (but easy to process) data structures.


The overriding reason to seek better electronic records and record processing in healthcare is to acquire the ability to use clinical information in ways that will compensate for and move us beyond the cumulative deficiencies of present practice. Some of these deficiencies are the result of the increased complexity of diagnosis and therapy, following the dramatic advances that have been made in both science and technology; others are a consequence of the down-skilling allowed by the increased predictability and reliability of present procedures; and some are due to the change of healthcare focus away from virtuoso medicine toward a production mode of cost-minimization with changed incentives for profit and for care. The danger in the "industrialization" of care through this new direction is that we will arrive at an over-standardized service industry that is little more than a "people processor," and that treats every case as if it were average. By contrast, the potential offered by advanced computing is that we are now in a position to move forward to the much newer concept of mass customization, where information processing provides new means of cost-effective attention to individual needs.

Identifying the Exceptional Cases

The key to effective intervention in healthcare is triage--the ability to identify and separate the truly sick from those who are minimally ill, anxious, or merely uncomfortable. Studies have shown that more than 85% of the calls to 911 are not emergencies, and more than 85% of those seen in emergency rooms or clinic settings are not really sick (although these latter may need advice or reassurance.) Thus, the triaging party must be constantly alert in the presence of tedium and much "noise," and must be prepared to respond to exceptions and unusual patterns that are likely to signify serious disease--a personal capacity some have called "a nose for disaster." This capability is not for everyone, but it is markedly reduced wherever examinations are minimized and routines become mindless repetitions.

This problem is not new, nor is it only a problem of healthcare.

Communication from a
Linguistic Perspective

All documentation intends communication, whether over space, as with HL7 (Health Level 7, the health content level of the ISO communication standard), or over time, as with a data archive. Most documentation is centered around language, even where images or sound tracks are included. In the latter cases, language is used to interpret them. Language and meaning is a much studied subject outside of medicine with a fruitful history. For example, Susanne K. Langer [1], in an insightful citation, translates and quotes Philipp Wegener [2] as follows:

All discourse involves two elements, which may be called, respectively, the context (verbal or practical) and the novelty. The novelty is what the speaker is trying to point out or to express. For this purpose he will use any word that serves him. The word may be apt, or it may be ambiguous, or even new; the context, seen or stated, modifies it and determines just what is meant.

The quote is well taken. Meaning must be grasped between the two separate components of context and what is to be specified by naming (i.e., novelty). This has been fruitfully restated in an on-line discussion by Lloyd Harding:

There seems to be a natural tension between authors (creators of information) and readers (in our particular context--computers). Creators want to be able to say what they need to say. Computers want you to say it in one particular way so it can process the information.

Present Shortcomings

It appears that healthcare informatics has been on the side of the computers, directing most of the effort toward establishing a standard vocabulary for a complete set of specific observations, thereby standardizing the description of novelty and depriving the authors of their initiative. To make what is captured into standard observations that fit neatly into data elements, designed to be retrievable using this same vocabulary, diminishes or even removes the potential balance provided by context. (This includes an assumption that significant change in what is observed and/or documented is sufficiently slow to be ignored.)

What Markup Has to Offer

To my mind, SGML and XML offer a better solution by addressing both the context and the "novelty," in order to arrive at an intended specification. This is not to deny that in healthcare almost all of the context and most of the observations to be made have both a predictable form and a predictable vocabulary, albeit less than stationary. However, it is precisely where the "shoe does not fit" where the key to successful diagnosis and therapeutic management is often made. It is the sick who are the outliers.

Given the ability to include an associative processing function and an intelligent human override, it should be possible to lay out the general descriptive components for markup well enough. For example, in a document such as a classical "History and Physical Examination" (or H&P), the general context is contained in the ritualized and thus more or less standardizable outline (and the sub-outlines contained within it, according to what problems are found). Here Langer points out further:

Since the context of an expression tells us what is its sense . . . and how . . . it is to be interpreted--it follows that the context itself must always be expressed literally, because it has not, in turn, a context to supplement and define its sense.

Thus, in our domain, the tags for an outline are not only the easiest vocabulary to standardize, but this does not stand in the way of a more dynamic specification of observations and actions. Moreover, these can be further specified and indexed by standard attributes, which can be treated as interpretive caveats. Consider two separate coding schemes which, by virtue of their different objectives, are complimentary:

  • SNOMED (Standard Nomenclature of Medicine) codes which are intend to capture the variety of detail within a single diagnosis (such as otitis media) using a multi-factorial interpretive scheme
  • ICD (International Classification of Disease) codes, which index disease for epidemiological purposes (and now billing) as aggregates

ICD codes become a means of extracting a complete set of diagnoses, perhaps with an oversort, and SNOMED offers a means of dividing these cases in a useful, analytic manner. Using markup to code a diagnosis both ways we can, in this somewhat more complicated manner, achieve a precision that is much more like an asymptotic relationship than a simple, rigid definition, where the final judgment is left to the user.

With this in mind, the usefulness and purpose of a tagged data base is to increase the descriptive potential in the data in order to better extract different problem-solving relationships for different application programs, each designed to serve the particular interests of one set out of a variety of stakeholders. One conceives of an electronic patient healthcare record that is a compendium of time oriented, marked up documents stored relationally in folders, which can be searched and abstracted into objects designed for these particular applications with their particular objectives.[1]

Some example objectives are (with considerable overlap):

  • Identifying the outlier sick individuals in a clinic practice where most with similar symptoms have minor illnesses
  • Adjusting the presentation of information to the problem-solving style of each user
  • Determining when a guideline is appropriate and when it should be overridden

It is the neutral enabling property of SGML that is hard for some to see at first exposure. It allows more, by doing less. It does not complete the job, but rather leaves that to another module: the application, and yet another: the judgment of the user.

  1. Langer, Susanne Katherina Knauth. "Philosophy in a new key: A study in the symbolism of reason, rite and art." Cambridge, Mass.: Harvard University Press, 1942.
  2. Wegener, Philipp. "Untersuchungen uber die Grundfragen des Sprachlebens" [first published 1885] newly edited, with an introduction by Clemens Knobloch, by Konrad Koerner. Amsterdam; Philadelphia: J. Benjamins Pub. Co., 1991.
  3. For an English translation see Abse, D. Wilfred. "Speech and reason. Language disorder in mental disease & a translation of 'The life of speech' [by] Philipp Wegener." Charlottesville, University Press of Virginia, 1971.
  4. Harding, Lloyd,, on listserv

About the Author

Thomas L. Lincoln, M.D.
Rom Lincoln, MD
RAND Corporation
1700 Main Street
Santa Monica, CA 90407

Thomas L. Lincoln, M.D. (Yale Med 1960) took his advanced training in Pathology at Yale and Johns Hopkins before joining the staff of the Institute for Applied Mathematics University of MD, and later the National Institute of General Medical Sciences, NIH. He has been a Senior Scientist at RAND in Santa Monica since 1967. Having retired in 1996 as Emeritus Professor of Research Pathology after 20 years at the University of Southern California, he has just taken a position on the faculty in the School of Biomedical and Health Information Sciences, College of Associated Health Professions, University of Illinois at Chicago. His interests over more than 30 years have been in various aspects of medical computing, with emphasis in the past ten years on health care information systems. This led to work over 20 years with Andersen Consulting, and to a role as reviewer of the NLM IAIMS (Integrated Advanced Information Management Systems) programs. He is a member of the American College of Medical Informatics, American Medical Informatics Association, American Medical Association, IEE, ACM, etc.

Web Security: A Matter of Trust