Menu

XML on the Cheap

June 27, 2001

Edd Dumbill

Fun things, useful services and neat tricks you can use with XML without paying a penny.

If you're new to XML, or simply want a starting point to play around with it a little, there are plenty of resources on the Web you can use for free, many without even installing software on your computer.

Content

The best place to start with XML is with an XML document. If you don't have any data of your own you feel particularly strongly about, there are several good sources of XML content on the Web which you can use. If you want large amounts of text, then Project Gutenberg is a good bet. Specifically, members of the HTML Writers Guild have marked up many Gutenberg texts in XML. Additionally, the HWG folks have provided CSS stylesheets for their texts, so you can browse the texts from Internet Explorer or Mozilla in their native XML.

Presuming you don't get lost in the wonders of literature and are still determined to do some messing around with XML, you'll have some XML that looks rather like this:

<book>



<meta id="Description" content="This is the e-text version of the book 

Tour through the Eastern Counties of England, 1722 by Daniel Defoe, 

taken from the original e-text ttece10.txt." />

<meta id="XMLFormatting" content="Arthur Wendover, 

 mailto:wendover@soon.com, UnitySpot.com" />



<frontmatter>

<titlepage>

<title>Tour through the Eastern Counties of England, 1722</title>

<author>by Daniel Defoe</author>

</titlepage>



</frontmatter>

<bookbody>

<chapter>

<para>. . . </para>

<para>

I began my travels where I purpose to end them, viz., at the City

of London, and therefore my account of the city itself will come

last, that is to say, at the latter end of my southern progress;

and as in the course of this journey I shall have many occasions to

call it a circuit, if not a circle, so I chose to give it the title

of circuits in the plural, because I do not pretend to have

traveled it all in one journey, but in many, and some of them many

times over; the better to inform myself of everything I could find

worth taking notice of.

</para>

...

Aside from Gutenberg, there are other sources of XML text. Jon Bosak has prepared himself well for being stranded on a desert island, having created XML versions of The Plays of Shakespeare and Four Religious Works (the New and Old Testaments, the Quran and the Book of Mormon).

If you're looking for something a little more modern and newsy, then it's worth investigating the various sources of RSS files available on the Web. RSS is a simple format for carrying headlines and descriptions of articles on web sites.

There are several sites aggregating RSS from all around the Web, making it accessible for free via HTTP. O'Reilly's Meerkat is one of the most flexible of these. For a complete tutorial on getting XML out of Meerkat, see Rael Dornfest's instructions. Here's an extract of RSS 0.91 data from Meerkat covering XML news:

<rss version="0.91">

 <channel>

  <title>Meerkat: An Open Wire Service</title>

  <link>http://meerkat.oreillynet.com</link> 

  ... 



<item>

<title>Xerces-J is W3C XML Schema complete.</title>

<link>http://www.xmlhack.com/read.php?item=1282</link>

<description>Lisa Martin has announced Xerces-J 1.4.1, a new version of the

Apache XML Project&#039;s Java-based XML parser, with an implementation of W3C

XML Schema considered virtually complete. </description>

</item>



<item>

<title>RDFStore 0.4</title>

<link>http://www.xmlhack.com/read.php?item=1284</link>

<description>Alberto Reggiori has released a major update to his Perl framework

for managing RDF databases, RDFStore.</description>

</item>



 ...

 </channel>

</rss>

If it's data you want, then the great grandaddy of all XML proof-of-concept demonstrations is available from NASDAQ, namely, a stock quote service. This query retrieves the price of everybody's favorite stock, Microsoft:

http://quotes.nasdaq.com/quote.dll?page=xml&mode=stock&symbol=MSFT

The XML returned covers all you need to know in extreme detail. (But, note, just because you can get this data from the Web it doesn't mean it's yours to republish. Make sure you know what you're allowed to do with any data you find on the Web.)

Getting it Right

If you opt to create your own content, you'll likely want some automated way of checking that it's well-formed or valid XML. There are several online services that enable you to do that, presuming you can make your XML available on a web server somewhere:

Another quick way to get content into XML is to use HTML Tidy, a tool which can be fed HTML and return it encoded in XML as XHTML. The W3C kindly offer HTML Tidy both as software and as an online service.

Adding Style

XSLT stylesheets are a very popular way to manipulate XML content. The W3C offers a web accessible XSLT processing servlet. The service simply requires two URLs, one for your XML document, and one for your XSLT stylesheet. Using this you can put together a style sheet (see our What is XSLT? and Transforming XML articles for ideas here), upload it to your web server and view the results of the transformation online.

There's more to XSLT than just transforming XML into a presentation format. The output of an XSLT transformation can be any XML document you wish. The W3C has used this to good effect with one of its services which applies an XSLT sheet to their home page (written, naturally, in XHTML) to automatically create an RSS 1.0 channel of the latest news on their site. This is an example of how XML, XSLT, and HTTP are quite a powerful application framework in their own right, before you even start writing programs on your local machine.

Cheap XML-based Web Publishing

Blogger is a free service that allows online web-form based creation of web sites. You simply feed it details of where to store your pages, tweak its templates a little and get going. By default, Blogger will store HTML on your web site. This isn't a very maintainable solution and leaves you in trouble should you wish to switch the tool you use to maintain your site.

Happily, it's relatively straightforward to alter Blogger to send XML to your web site. Providing your hosting service supports ASP (as in the referenced article) or some other method of transforming XML to HTML, it's easy to separate your web site content from its presentation and insure yourself against changes in your toolset. You also get more value out of the content you input into Blogger as it becomes reusable in different contexts and available for programmatic manipulation.

Fun with Web Services

If you're prepared to string a few lines of your favorite programming language together, there are other services accessible on the Web via XML and HTTP interfaces that you can use. Many of these are implemented in SOAP or XML-RPC. These two protocols essentially prescribe a way of wrapping up messages in XML so that a program on one machine can communicate with a remote program.

XML-RPC is an easy way to get started with these web services. Start off by downloading an implementation for your programming language of choice. Play around with the examples in your distribution a bit, then try out one of the public XML-RPC services. If you tried out Meerkat's XML features described earlier in the article, then try out its XML-RPC interface too. This small fragment of Python shows how easy it is to get news from Meerkat:

>>> from xmlrpclib import *

>>> server = Server("http://www.oreillynet.com/meerkat/xml-rpc/server.php")

>>> print server.meerkat.getItems(

         {'search':'/[Pp]ython/','num_items':5,'descriptions':0})

UserLand Software offer both XML-RPC and SOAP interfaces to its web-based content management system, Manila. As it describes on its site:

A browser-based interface is wonderful, but sometimes when you're editing a long story, specification or agreement, you need a better editor than the one built into web browsers. This interface allows developers of text editing software to enable writers to edit text in a Manila site as easily as they edit text within a web browser.

That integration between web sites and other systems is making a lot of people excited about Web Services at the moment. UserLand and several other companies offer hosting of Manila sites. Other free XML-RPC services include spell-checking and forum software.

There are also many SOAP services you can get started with in a relatively short time and with not much more effort than XML-RPC. XMethods is a handy web site listing public services, with anything from checking airfare information to finding MP3s and generating Shakespearean-style insults.

Unlike the W3C XSLT service described above, the web service model does insulate you a little more from the nuts-and-bolts of the XML and steers you toward application development. For more on web services, check out O'ReillyNet's Web Services DevCenter.

Conclusion

This article has only scratched the surface of the free resources available on the Web for XML. Thanks to the interoperability of XML and HTTP, many of these services can be used together, and are also easy to integrate into your own programs. For a more comprehensive list of XML resources on the Web, use the XML.com Resource Guide.