Menu

Introducing o:XML

July 21, 2004

Martin Klang

Most of us tend to think of XML as a useful, general-purpose data format. New vocabularies are springing up all the time, as do tools for XML processing. Still we haven't yet realized the potential of XML as a facilitator in the software development process itself.

o:XML is one project that promises to deliver the same benefits for programming that XML already offers for data representation. Furthermore, it may just make it easier than ever to code XML applications.

So what is o:XML? Well, it's a dynamically typed, general-purpose object-oriented programming language. It's got threads, exception handling, regular expressions, namespaces, and all the other things you would expect from a modern language. And it's expressed entirely in XML. Maybe o:XML is a bit like Python crossed with XML. To give you more of an idea, here's a version of Hello World, complete with type and function declarations:

Example 1: Hello World

<?xml version="1.0"?>

<o:program>

  <o:type name="HelloWorld">

    <o:function name="hello">

      <o:do>

        <o:return select="'Hello World'"/>

      </o:do>

    </o:function>

  </o:type>

  <o:set instance="HelloWorld()"/>

  <o:eval select="$instance.hello()"/>

</o:program>

XML being XML, the result is a verbose language that many will find difficult to read and hard work to write. It doesn't look like a "real" programming language like C or Java. And let's face it, there are already too many programming languages in the world. So why o:XML? Before addressing that question I will offer a short overview of the language itself.

Language Overview

Running an o:XML program produces XML output -- XML contained within the program is simply copied to the output. With this knowledge we can rewrite Hello World as follows:

Example 2: Hello Again

<?xml version="1.0"?>

<o:do>Hello World!</o:do>

Since o:XML is object-oriented, everything is an object and every object has a type. The basic types are String, Element, Document, etc., representing types of XML nodes. There are also collection types and an evolving set of core libraries. Users can, of course, create their own types and functions.

o:XML has an expression language very similar to XPath. The main difference is that o:Path allows functions to be invoked on nodes and node sets, just like with objects in Java or C++. Since in o:XML nodes are objects, o:Path can be used not only to find and select nodes, but also to modify and change their state.

Example 3: o:Path Expression

//a[@href.contains('cnn.com')].remove()

Removes all a elements with href attributes containing the text cnn.com.

In this example the function contains() is invoked on the href attribute of all a elements in the document. The function remove() is invoked on all a elements that match the predicate.

Any valid XML document is a valid o:XML program. Furthermore, o:XML can be embedded anywhere in an XML document. Embedded use of o:XML means that dynamic content can be included in a document without breaking XML validity. o:XML exists in its own namespace, and respects document integrity.

Say, for example, that you store your holiday pictures in a directory. You would like to produce an RSS feed with names and links to the pictures. We construct a simple program based on the structure of plain RSS, and use the o:Lib io module to access the file system.

Example 4: RSS Feed

<rss version="2.0">

  <channel>

    <title>Pleasant Pictures</title>

    <link>http://www.o-xml.org/</link>

    <description>An o:XML example.</description>

    <o:import href="lib/io.oml"/>                           1

    <o:set dir="io:File('.')"/>                             2

    <o:for-each name="file" select="$dir.list('*.jpg')">    3

      <item>

        <title><o:eval select="$file.name()"/></title>      4

        <link>file://<o:eval select="$file.path()"/></link>

      </item>

    </o:for-each>

  </channel>

</rss>

  1. Load the io module.
  2. Set the variable dir by creating a File that represents '.', the current directory.
  3. Iterate over all jpgs in the directory, assigning each one to the variable file.
  4. Get the file name and insert the value into the title element.

The above example represents a complete o:XML program. If you run it from the command line, it will produce an RSS output with one item element for each file in the current directory. As with all o:XML programs, it can also run as part of a web application or from a build script.

The example could easily be extended. You might want to sort the pictures by date (using o:sort), or prettify the filename by removing the suffix, and replace underscores with spaces (using the regexp substitute() function). Alternatively, an exsting XML data source such as the iPhoto AlbumData could be used to generate names and descriptions.

What is nice about simple o:XML scripts like these is that there is no "impedance mismatch" between the programming language and the data format. Coding in o:XML means that you can work quickly and elegantly, never departing from the Web's native tongue.

The next example further illustrates how o:XML integrates with XML. The program produces an SVG bar chart that shows the number of words in a DocBook document, broken down by section. It takes one parameter, the document filename.

Example 5: DocBook Word Count

<o:program>

  <o:param name="file"/>                                    1

  <o:import href="lib/io/File.oml"/>                        2

  <svg:svg>

    <o:for-each select="io:File($file).parse()//section">   3

      <o:set x="count(preceding::section) * 30 + 20"/>      4

      <o:set words="count(.//text().match('\w+')) * 2"/>    5

      <svg:rect width="10" x="{$x}" y="{250 - $words}">     6

        <o:attribute name="height" select="$words"/>

      </svg:rect>

      <svg:g transform="rotate(45, {$x}, 260)">

        <svg:text x="{$x}" y="260">                         7

          <o:eval select="title/text()"/>

        </svg:text>

      </svg:g>

    </o:for-each>

  </svg:svg>

</o:program>

  1. file parameter declaration.
  2. Load the io:File library type.
  3. Iterate over all section elements in the parsed file.
  4. Calculate the horizontal position of the next bar.
  5. Count the number of words in this section.
  6. Produce an SVG rectangle.
  7. Create an SVG text tag with the title of this section, rotated 45 degrees.

Here is the output produced by the program when run against this article:

I ran the program from the command line with this command:

java -jar objectbox.jar -Dfile=introducing.xml svg-example.oml

When an o:XML program is deployed in a servlet engine, it will take its parameters from an HTTP request. This means that simple command-line scripts work without modification as web applications and web services. Our example could be invoked like this:

http://milt.local/examples/svg-example.oml?file=introducing.xml

o:XML programs can equally well be invoked by Ant build scripts, with Ant providing the parameter values dynamically. This can also be very useful, for example, to automate XML publishing tasks.

X is for eXtensible

As expected, o:XML provides the same capabilities as most other modern languages. Being XML, it also allows for processing and extensions, which doesn't come naturally to conventional programming languages. What this means in practice is firstly that we can harness the power of tools and technology developed for XML. Secondly, our code integrates seamlessly with practically every other XML vocabulary.

We've just seen how o:XML programs can be in-lined in RSS, or used to generate SVG output. We will now take a brief look at how the source code itself can be extended.

The following is an example taken from the o:Lib implementation of an internet message. It includes two separate extensions: Documentation and Unit Tests. The automated build produces cross-referenced type documentation, unit tests, and test results. This is achieved with standard XML processing tools such as XSLT and an o:XML interpreter.

Example 6: Extensions in net:Message

<o:type name="net:Message">

  <doc:p>Represents an RFC822 Internet Message</doc:p>      1

  <o:function name="header">

    <o:param name="name"/>

    <doc:p>Get a message header</doc:p>                     2

    <o:do>

      <o:return select="$headers.get($name)"/>

    </o:do>

    <ut:test>                                               3

      <ut:input ref="msg1"/>                                4

      <ut:definition>                                       5

        <o:set msg="net:Message($input.string())"/>

        <o:return select="$msg.header('Host')"/>

      </ut:definition>

      <ut:result>localhost:8090</ut:result>                 6

    </ut:test>

  </o:function>



  <ut:dataset name="msg1">                                  7

    Content-Type: text/plain

    Content-Length: 8

    

    TestData

  </ut:dataset>

</o:type>

  1. Type documentation.
  2. Function documentation.
  3. Unit test declaration.
  4. Test input declaration.
  5. Test definition.
  6. Expected test result.
  7. Test data definition.

There are no inherent limitations to the type or nature of extensions that may be incorporated into o:XML source code, as long as the data can be expressed in XML. Currently available extensions vary from documentation and embedded tests to declarations that alter the behavior of an application, such as Design By Contract assertions and Aspect Oriented Software Development definitions.

The possibilities are almost endless: An integrated development lifecycle starting with requirements, through definition and UML modeling, to development, test plans, and deployment, could be centered around a common XML information repository.

Why o:XML?

And so we return to the question: Why o:XML? My preferred answer would be: Using the same format for code as we use for data allows us to think slightly differently about the code. The application is not only a runtime executable, magically incantated by the source code -- it is structured data, information, a document!

With XML, the source code is made transparent, it can be processed in the same way, with the same tools that we use for other structured information. Writing o:XML programs that generate or instrument other programs is remarkably easy. The XML code format allows for orthogonal extensions, integrates with development methodologies, and is a natural test-bed for abstract languages, meta-programming, and reflective programming.

o:XML is a practical language, designed to solve practical problems. For many types of applications it already provides faster, easier, better solutions than any other technology. If you are writing code that processes or produces XML, chances are you will benefit from using o:XML.

Related Work

There is a growing interest in defining an abstract XML programming language, and several independent projects have been started. One of the more evolved ones is Marko Topolnik's Jezix, which currently has a focus on Java but intends to be independent of the source language. The main purpose of Jezix is to provide a source-code representation for automated manipulations and orthogonal extensions. Another interesting project is JavaML, which aims to provide an XML representation of Java.

Other languages that have deep XML integration and treat XML as first-class constructs are XDuce and Water. XDuce is a functional language, while Water is object-oriented with a focus on web services. However, they both use a proprietary, non-XML syntax.

The field of web application programming was possibly the first to embrace the use of XML to encode business logic. JSTL provides a library for programming constructs that makes it possible to build logic blocks in XML. Jelly is an Apache project that builds on JSTL to enable easy integration of XML pipelines with Java code. TagBox, in turn, is a project that has grown to be a complete, extensible 4GL. It now supports procedure and module definitions in XML. Still, none of these technologies are really programming languages in their own right, and don't offer much in terms of abstraction mechanisms.

Project Status

The key component of o:XML is the Java-based compiler and interpreter, ObjectBox, which recently went into production release with version 1.0. ObjectBox allows o:XML programs to run standalone, from an Ant build file, or within a servlet engine such as Tomcat. It also allows for integration with existing Java applications using either BSF or its own API. ObjectBox currently offers a full, stable implementation of o:XML.

The core language is supported by extensions and library code. The o:XML standard library is o:Lib, which in its first release includes file, stream, network, and XSLT functionality.

In addition to library code there are also several language extensions available. Java extensions allow Java classes to be used seamlessly in o:XML programs. The popular database extensions handle database connection pooling and transactions, while providing a uniquely simple, powerful, RDBMS-to-XML mapping mechanism.

The compiler/interpreter also has experimental support for reflection. The reflection extensions expose runtime information, such as types and functions, in XML format. This opens the information up for easy manipulation using XPath. For example, groups of type functions can be selected with simple XPath expressions based on their names, parameter types, or attributes of the types they belong to. The extension supports higher-order functions, partial function completion, and the usual reflection types.

Another attractive quality of XML programming languages is that they can be machine-translated using readily available technology. There is an experimental XSL transformation of o:XML programs into plain Java that has been developed as part of the o:XML project. It is still only partially complete but working. A more complete, public release will be available in the near future.

Looking to the Future

Since the first public release in early 2002, o:XML has certainly come a long way. And there's an awful lot of activity and related projects with deliverables currently in the pipeline. I think we can expect the field of XML programming to grow at an ever-accelerating rate in the next few years, taking both markup languages and programming in general to new levels.

Work has already started on the next-generation o:XML compiler. It will be centered on a portable, native-language, transformation engine. The goal is to produce efficient code for a number of targets, including Java Bytecode and C, using o:XML tools throughout.

There is also scope for producing a minimal-footprint engine for embedded and portable devices, such as PDAs and mobile phones. With the right tools, o:XML programs can be transformed into highly optimized code targeted at restricted or specialized platforms.

A great deal of work has already taken place on lifecycle and methodology integration. o:XML lends itself to processing and code generation like no other language, and using XML tools means that every step in the process is transparent. There's more information about workflow integration on the o:XML web site.

In the department of orthogonal source-code extensions, there lot's of work underway on developing standards and tools. Unit test and documentation extensions are currently put to use across several projects. The specification and implementation of Design By Contract extensions were part of a paper presented at the Extreme Markup Conference 2003. The paper, XML and the Art of Code Maintenance, also included blueprints for Aspect Oriented Software Development in o:XML, which has since been further developed. Expect a public release soon!

Open Source o:XML Projects

ObjectBox

o:XML compiler and interpreter in Java

hatatap

HTTP test script language and tools

vendue

Online auctioning software written in o:XML with db extensions

o:Lib

o:XML core libraries

Resources