News Wire Services Heading for XML

August 12, 1998

Tim Bray

Where Does News Come From?

Ever wonder where news comes from? Look through your morning paper, or a news-oriented Web site, and ask yourself: did the newspaper or web site's reporters write all the coverage of Japanese credit problems, Kosovar uprisings, and Mississippi floods?

Obviously not. The vast majority of news comes from news agencies. The most famous of these among English-speakers is the Associated Press, but many people have also heard of Agence France-Presse and Reuters. There are lots of others, and they have their own industry organizations, such as the Newspaper Association of America (NAA) and the International Press Telecommunications Council (IPTC).

In fact, there is a network of satellites, low-bandwidth wires and high-bandwidth trunks carrying news around the planet every minute of every day, pouring into news rooms and businesses everywhere.

Dumb Text on the Wire

If you peek into these wires, you see some very old-fashioned stuff; the news-text is sent around the world in a variety of formats that can collectively be described as "dumb" - straightforward text, with what little markup there is oriented to the needs of newspaper page composition; and old-fashioned pre-photo typesetting composition at that.

In these days of computer-driven everything and multimodal news presentation, the reliance on dumb text formats is becoming increasingly expensive. For example, the New York Times estimates that they spend several million dollars per year on laborious error-prone hand-conversion of news wire text to serve the needs of their many paper and electronic publications. They are not alone. News delivery services everywhere, not just the Times, all want to do web sites, they want to do audiotext, they want to do customized services and multimedia; dumb news wire text is standing in the way of all these things.

Smartening up the Text

One obvious solution is to make the text smarter. For years, the NAA and IPTC have been working on the News Industry Text Format, and at the start they very sensibly chose SGML as the appropriate format.

Of course, publishing a DTD, as many organizations have learned over the years, is no guarantee that people are going to start picking it up and deploying it.

The news industry suffers from an immense legacy problem: every significant newspaper and radio station in the world has one or more news wire connections, and has probably had these for decades. If you look at the technology behind these wires, you see a lot of very old stuff: sixteen-bit minicomputers, 110-baud modems, dedicated character-only authoring systems.

If you look at the businesses behind these wires, you find a culture that has always been stingy and in recent years has become frightened about its future growth and perhaps even its survival. There is resistance on the part of management to investing a lot of money in new technology, and (sometimes more) resistance on the part of journalists to any changes that slow them down or "get in the way".

So, while there have been a few trials of the NITF (notably at the German DPA agency), the text flowing over the world's news wires today remains, by and large, very dumb.

Another of the reasons for this is the "usual suspects" that always seem to stand in the way of deploying SGML; the products are said to be too expensive and too hard to deploy and not stable enough, and to come from companies that nobody's heard of.

Moving to XML

As you might expect, XML is starting to look very useful to these people. At a series of NAA/IPTC meetings which wound up recently in Portland, Oregon, the NITF DTD was, with surprising ease, rebuilt and made XML conformant. Fortunately, there was little reliance (besides a few "pernicious mixed content" declarations) on SGML arcana, and the transition, for those who have started building SGML-based systems, will be painless.

There will be a full-day "Vendors' Forum" on the NITF held at the Holiday Inn Crowne Plaza in Lyon, France on October 10th; attendance is by prior registration, inquiries by email.

Our Take

Moving from SGML to XML will not likely make smartening-up the world's news wires any easier; but it should make undertaking this task a much easier sell. The good news is that a confluence of expiring hardware and looming Y2K problems is starting to place immense pressure on the news industry to do some technology updating anyhow; so there is a significant chance that in the near future, we'll see that worldwide round-the-clock flow of news become a little smarter, a little more re-usable, and a little more long-lived.

For anyone who believes in the core ideas of SGML and XML, this has to be exciting news. Because the news wires' contents, at any one moment, are a snapshot of the world's history, at that moment. For now, a snapshot is all it is, because the text is too dumb to do anything but look at. When the XML NITF markup becomes ubiquitous, the news wire contents will move from being a snapshot to being a chronicle, a textbook, a permanent reusable record for the ages of how history looks as it happens.