Functional Programming and XML
(A French translation of this article is available).
As is all too common in the programming world, much of the XML community has identified itself and all its works with object oriented programming (OOP). While I'm a fan of OOP, it's clear to me that even the best OOP-for-XML techniques aren't a panacea, and, moreover, there is an awful lot of ad hoc "objectification" which tends merely to make our lives more difficult and our programs less clear.
This short-sightedness has two negative consequences: it tends to limit the techniques and tools we use, and it tends to restrict our understanding. For example, although the Document Object Model (DOM) satisfies few and inspires fewer, its status as a standard tends to inhibit (though, fortunately, not to suppress) exploration into alternative models and practices. The debate tends to revolve around "fixing" DOM, which is cast in terms of getting a better object model. While a better object model would be nice, it's important to keep in mind that XML is neither historically nor inherently object-oriented. Thinking otherwise may lead you to perceive or anticipate a better fit than you actually get.
One cure for intellectual myopia is to go exploring. In this article, I provide a beginner's travel guide to the interesting and instructive land of functional programming (FP) and XML.
Why Functional Programming?
If you are heavily into XML, and use a wide range of XML-related technologies, it's highly likely that you are already doing some FP. XSLT is more or less the transformation language of DSSSL, in an XML syntax, which is a proper subset of DSSSL which, itself, is a purely functional subset of the Scheme programming language (plus a large library). Thus not only are there are historical connections between FP and XML, but critical present day XML tools involve FP.
This connection is a bit more than accidental. XML is generally declarative, as are functional programming languages. XML is a metalanguage -- a language for defining languages. ML, one of the big time functional languages, is so named as an abbreviation for "Meta-Language". In general, FP languages excel at language definition and implementation. It's natural to think of XML documents as trees, and functional languages tend to have lots of nifty facilities for representing and manipulating trees. Finally, to wrap up the vagaries, the goal of working at the very high level of structural mark-up, where we specify documents in terms of their logical features rather than particular rendering procedures, is similar in spirit to the ideals of FP, where we strive to specify computations in mathematical terms rather than machine- or recipe-oriented terms.
One need not take up with a fancy-pants FP language in order to
benefit from FP tropes, techniques, and thinking. For example, the
popular scripting languages Python (see the
functional.py modules of the Xoltar
Toolkit) and Perl (see this wonderful
article among the many on this and other topics by M-J. Dominus) both have several
FP-inspired features and can be used to experiment with FP
techniques. However, it's worth investigating languages and systems
designed from the ground up to support functional programming, if only
since those most expert in FP tend to use them for their own XML
(Note: I recommend http://lambda.weblogs.com/ for folks interested in a reasonably soft-core FP resource.)
As the name implies, functional programming is "function-oriented" programming (though C doesn't really count). FP languages allow you to manipulate functions in arbitrary ways: building them, combining, them, passing them around, storing them in data structures, and so on. And it's key that FP functions are (at least, ideally) side-effect free. FP's declarative nature makes it much easier to deal with things like concurrency, data structure transformations, and program verification, validation, and analysis. Functions are to FP what objects are to OOP.
While there are many ways to classify functional languages according to technical features and formal properties, such divisions tend to be more useful to the FP experienced, degenerating into mere buzzwords for the novice. I have a rough and ready way of dividing FP languages which seems to capture how they feel to the casual FP dilettante:
- Lispy languages, especially Scheme (and, somewhat atypically, Rebol and Dylan).
- Type-obsessed languages, such as ML, OCmal, Clean, and Haskell.
- Prolog-derived languages, such as Erlang, Mercury, and Oz.
In spite of my hope that the XML.com readership contains no members of the FP community -- if just for the sake of my ambitions for "gatekeeper" status -- I recognize this to be an unrealistic fantasy. So, let me forestall vicious discussion-group comments by pointing out that Mercury is very type obsessed, that many of these languages are multi-paradigm, and that Common Lisp has plenty of nice functional features. As for the rest of you, I'm a tour guide, not a guru.
While I could generate specific criteria for these categories (e.g., Lispy languages tend to be based on the untyped lambda calculus, whereas type-obsessed languages on typed variants), what I really use to make this division is my impression of the "look and feel" of the languages. So while Rebol doesn't use classic s-expression syntax (and thus doesn't tend to look like a Lisp and, indeed, tends to block or make difficult a bunch of classic Lispy moves) I find myself, when using it, thinking much the way I would were I using a Scheme system. Thus, if you master one language in a category, the rest are unlikely to throw you off your stride.
The other Big Thing about FP is that it tends to be the subject of quite a bit of academic research, to the point that FP, in general, tends to be dismissed as ivory tower nose blowing. In fact, there are several commercially supported implementations of various FP languages, many of the free compilers and environments are very good quality, and FP is used in many high powered production environments. At the other end of things, there are a fair number of friendly FP learning systems, ideal for a bit of dorking around without a large investment of time, energy, or installation havoc.
But the academic nature of FP is also an asset. While preparing this article, I read a slew of papers -- some of which described systems (e.g., XMLambda) not yet fully realized or released. However, the ideas and their presentation stretched my perspective on XML processing in refreshing ways. As I stated earlier, it's not just the lack of tools, but the constraints on our understanding, that we need to cure.
The rest of this article highlights some of the interesting features of three FP systems for processing XML: XMLambda (an XML-specific FP language), HaXML (XML facilities for Haskell), and the new XML support in Erlang.