WavePhore Backs XMLNews Initiative
April 20, 1999
A new XML initiative announced last week promises to spread the syndication of news content on the Web. XMLNews, developed by David Megginson, is an XML specification that describes the text and metadata of news content. The specification and related materials have been published at XMLNews.org.
Megginson is principal of Megginson Technologies, chair of the World Wide Web Consortium's (W3C) XML Information Set Working Group, and maintainer of the widely implemented Simple API for XML (SAX). His XMLNews effort was funded by WavePhore, a new-media content aggregator that provides NewsPak, a prelicensed news-feed distribution for web publishers.
|"One of the biggest problems in developing standards is that they tend to fall on the wrong side of the cost/benefit tradeoff."|
Addressing a growing problem
As its number of sources has increased, WavePhore has struggled with the challenge of handling and distributing content that contains all manner of markup, much of which is not in a useful form for WavePhore's customers. "XML is changing the fundamental nature of news and information delivery over the Internet," said David Deeds, president and CEO of WavePhore. "The wide variety of formats and delivery mechanisms introduced in the past by news publishers defined a mission for WavePhore: aggregate, normalize, and redistribute news in a standard format that makes adding news to any web site easy and inexpensive."
At Spring Internet World, WavePhore announced its NewsPak service. Through content agreements from providers such as Associated Press and PR Newswire, NewsPak delivers content from 34 news sources generating more than 2,000 stories each day. NewsPak will deliver its stories in the XMLNews format—the first time WavePhore is offering a normalized feed and the first one in the industry supplied in XML form. The second vendor to announce it intends to support XMLNews is Corel, manufacturer of WordPerfect, which plans to supply a news authoring tool.
Borrowing from prior experience
XMLNews borrows heavily from the News Interchange Text Format (NITF), an XML DTD created last year by a committee of the leading newswire agencies, including Reuters, Associated Press, Agence France-Presse, Dow Jones, and others, under the auspices of the International Press Telecommunications Council and the Newspaper Association of America. The NITF DTD—which was originally expressed in SGML—is meant to do for text what the new IPTC header did for wire-service graphics a few years ago—bring the markup of newswire stories into the digital era. The markup of present-day text newswires—the ANPA 1312 format—contains arcane codes that date back to composition systems from decades ago and is widely acknowledged to be in need of an overhaul.
Though he developed XMLNews without the cooperation or assistance of the NITF committee, Megginson expressed his admiration for the work they've accomplished. "I can't praise NITF enough. Their work is very well thought-out, and saved me enormous effort in creating this new specification." Another bow Megginson made to NITF was compatibility: Any news story created to the XMLNews specification will conform to NITF. A key aspect of both formats is that markup can be carried forward, but also extended as a news story makes its way from press agency to news publisher and thence to secondary publishers.
Why make a new format?
In our interview with Megginson, we asked him why he needed to develop a new format at all, when the press agencies had already developed one for news. Megginson cited two reasons. First, the large, fixed tag set of NITF posed a barrier to implementers who didn't have a use for the whole set. "One of the biggest problems in developing standards is that they tend to fall on the wrong side of the cost/benefit tradeoff," Megginson said. "I wanted to find a way to provide much of the benefit at a fraction of the cost. XMLNews is designed so that software can gracefully handle a subset of the full markup that might be contained in a news feed."
Second, Megginson wanted a way to extend NITF to cover additional metadata that other companies might want to create. "The problem is that traditional wire service news no longer encompasses all of the feeds a web aggregator or publisher deals with," Megginson explained. "I guarantee you, a year from now, some kid that no one had heard of today will be on the cover of every magazine because he?s invented something that everyone wants to put onto their web site. I was looking for a way to future-proof the framework,because there?s no way you can come up with a single set of metadata that can anticipate future developments."
The route to extensibility that Megginson chose was to split the metadata about the news item into a separate file that conforms to the Resource Description Framework (RDF) developed by the World Wide Web Consortium for the exchange of metadata over the Internet. XMLNews-Meta allows metadata for any kind of news information, including textual news stories, photos, audio or video clips, or even virtual 3-D world and interactive scripts.
Though Megginson says he did not intend to offend the NITF committee, the way he and WavePhore went about developing and announcing it without consulting the committee clearly rankled some of the members we spoke with. Some of the members were not aware of XMLNews at all before it was posted on the Web. None of them have stepped forward to endorse it.
Vendor support for NITF has been slow to materialize, but this year we expect to see at least several commercial implementations. Reed Technology Information Services, which markets the NewsView library system originally developed by Tribune and subsequently sold to Lexis-Nexis and Reed, expects to announce NITF support later this spring. CCI Europe, Atex, Baseview, and other suppliers of newspaper editorial systems are watching the standard closely and told us they would welcome its adoption by the industry. So far, only European press agencies have implemented NITF.
Syndication of news on the Web is spreading at a prodigious rate, and the current situation, in which nearly every feed has its own format, is a definite barrier to both resellers and to those trying to add syndicated news to their sites. Today, adding news feeds often requires writing custom "handler" routines, because the data is supplied in different forms. A single, normalized format that was widely adopted by vendors could go a long way toward alleviating these technical headaches and greatly reduce the time and expense of handling syndicated news.
Though it was not included in the initial writing of the XMLNews specification, when the dust settles, NITF members could see a significant benefit by backing the initiative. If they support XMLNews, the wire services could see their potential market expand because the technical challenges of handling a wire feed would be greatly reduced. Though Megginson acknowledges that XMLNews may not be the implementation IPTC or NAA envisioned, he said they should be proud that this is the largest commercial implementation of the NITF DTD to date, and one that will spur further interest in their work.
Vendors of newspaper systems that are in the midst of implementing NITF would have to modify their software to handle the new format. But there would be clear benefit to doing so. The newspaper systems built to handle the old AP 1312 format presumed print, but a new generation of content-management systems handle both web and print output. By building in support for XMLNews, these vendors could offer a web-only version of their products that could appeal to publishers who publish news on their web sites but do not produce newspapers or magazines.
Other vendors could clearly benefit by being able to implement a more simple spec. A vendor making a low-end system, for example, could implement the XMLNews base tag set and ignore additional markup. Higher-end systems could add value by taking advantage of publisher-specific markup.
As with any standard, success hinges on adoption by both vendors and users. The NITF is in the process of updating its spec to include two dozen or so tags requested by vendors and users of library archive systems. That work is in draft form, and is expected to be adopted by June. Their next meeting is scheduled to take place in about two weeks, and we understand that WavePhore has been invited to attend. We'll keep you posted.
See also: NewsPak review