Menu

Creating XML with Genx

June 23, 2004

Michael Fitzgerald

Genx is an easy-to-use C library for generating well-formed XML output. In addition to being well-formed, Genx writes all output in canoncial form. It was created by Tim Bray with help from members of the xml-dev mail list. Work on Genx was announced on xml-dev on 19 January 2004. Some of the benefits of Genx include size, efficiency, speed, and the integrity of its output. Genx is well documented; it's fairly easy to figure out what's going on just by looking at the well-commented source code.

This article will show you how to download, install, and compile Genx; then it will walk you through two examples programs. The article assumes that you are familiar with XML, the C programming language, and that you have a C compiler and the make build utility available on your system. The example programs in this article have been tested under version beta5 of Genx.

Setting Up Genx

The first thing you have to to is download Genx. It comes in a tarball only. After you download it to a working directory, you need to extract the files. While at a shell or command prompt, change directories to a working directory you've set up for Genx. If you are on a machine that runs a Unix operating system, decompress the Genx tarball (e.g., gzip -d genx.tgz), then extract the tar file genx.tar (e.g., tar xvf genx.tar). This create a genx subdirectory where all the files from the archive will be extracted. (If you are on Windows without Cygwin, you can use a utility like WinZip to extract the GZIP archive.)

Compiling Genx

Genx comes with a Makefile for building the project. While in the genx subdirectory, just type make, and the process begins. The build will compile the needed files genx.c and charProps.c. genx.c includes the genx.h header file; charProps.c is where character properties are stored, and it is apparently used to test for legal characters in XML.

The ar (archive) command is invoked to create an archive from object files genx.o and charProps.o The archive is called libgenx.a. The ranlib utility is also invoked to create an index for the archive. You will need to use libgenx.a when you compile your own Genx files. One other program, tgx.c, is also compiled and run. This program runs a number of tests on Genx and reports on what it finds so you know everything is working as intended.

A First Example

Several test programs are provided in the Genx package and are stored under the docs subdirectory. I have written a few additional sample programs that I'll highlight here. You can download these programs. Place this archive under your genx subdirectory and extract the contents there to the genx-examples subdirectory. Change directories to genx-examples and type make again (the example archive I've provided also has its own Makefile). After you invoke make in genx-examples, the example programs will be built and ready to go.

First, here is a simple C program called tick.c that uses functions from the Genx library:


#include <stdio.h>

#include "../genx.h"



int main()

{

  genxWriter w = genxNew(NULL, NULL, NULL);



  genxStartDocFile(w, stdout);

   genxStartElementLiteral(w, NULL, "time");

    genxAddAttributeLiteral(w, NULL, "timezone", "GMT");

    genxStartElementLiteral(w, NULL, "hour");

     genxAddText(w, "23");

    genxEndElement(w);

    genxStartElementLiteral(w, NULL, "minute");

     genxAddText(w, "14");

    genxEndElement(w);

    genxStartElementLiteral(w, NULL, "second");

     genxAddText(w, "52");

    genxEndElement(w);

   genxEndElement(w);

  genxEndDocument(w);

}

The second line of the program is an #include directive for the copy of the genx.h header file that is located in the directory above genx-examples, provided that Genx and the examples were installed as directed. You can also place a copy of genx.h in the location for system include files (on my Cygwin system, for example, the location is c:/cygwin/usr/include). If a copy of genx.h is in the system include location, you can change the #include directive to #include <genx.h>.

The first statement inside main creates a writer for the output of the program. The variable w is of type genxWriter, and it's initialized by the genxNew function. (Looks like a Java constructor, doesn't it?) genWriter is a pointer to the struct genxWriter_rec which stores all kinds of information about the document being built. The three arguments to the genxNew function are for memory allocation and deallocation. When all three arguments are set to NULL, we are basically instructing Genx to use its default memory handling (with malloc and free).

Following this initialization of a writer is a series of function calls, each with small job. Notice that the first or only argument to each of these functions is w, the writer structure. The call to genxStartDocFile starts the writing process. The second argument, stdout, indicates that the document will be written to standard output. (The document could otherwise be written to a file as you will see in the next example.) At the end of the program is a call to genxEndDocument which signals the end of the document and flushes it.

The program also contain four calls to genxStartElementLiteral each of which is terminated by a call to genxEndElement. genxStartElementLiteral has three arguments. The first is the writer structure (w) explained previously, next is a namespace name or URI (always NULL here), and the third is the element name, such as time or hour.

If you give an element a namespace URI in the second argument, Genx writes the namespace URI on the element with an xmlns attribute and automatically creates a prefix, which is used on any child elements that have the same namespace declared.

The text content for a given element, if any, is created with genxAddText, with the second argument containing the actual text, such as 23 or 14.

You can probably guess that genxAddAttributeLiteral writes an attribute on the element that is created immediately before it. It has four arguments. The first is the writer structure, and the second is a namespace URI which is NULL if no namespace applies. The third argument is the attribute name and the fourth is the attribute value.

To run the program, just type tick at the prompt (it was compiled with make previously). The output of the program should look like this:


<time timezone="GMT"><hour>23</hour><minute>14</minute><second>52</second></time>

This output is canonicalized XML. Some obvious marks are no XML declaration, no whitespace between element tags, and double quotes rather than single quotes around attribute values. Now let's look at a Genx example that is a little more complex.

Another Approach

In the next example we will explore a different approach for writing an XML document with Genx. The program tock.c declares elements, an attribute, and a namespace before it uses them, then writes elements and an attribute with different functions that are more efficient than their "literal" counterparts. It also write its non-canonical output to a file. Here is the code:


#include <stdio.h>

#include "../genx.h"



int main()

{

  genxWriter w = genxNew(NULL, NULL, NULL);

  FILE *f = fopen("tock.xml", "w");

  genxElement time, hr, min, sec;

  genxAttribute tz;

  genxNamespace tm;

  genxStatus status;

  tm = genxDeclareNamespace(w, "http://www.wyeast.net/time", "tm", &status);

  time = genxDeclareElement(w, tm, "time", &status);

  tz = genxDeclareAttribute(w, NULL, "timezone", &status);

  hr = genxDeclareElement(w, tm, "hour", &status);

  min = genxDeclareElement(w, tm, "minute", &status);

  sec = genxDeclareElement(w, tm, "second", &status);



  genxAddText(w, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");

  genxStartDocFile(w, f);

  genxPI(w, "xml-stylesheet", " href=\"tock.xsl\" type=\"text/xsl\" ");

  genxComment(w, " the current date ");

  genxAddText(w, "\n");

  genxStartElement(time);

   genxAddAttribute(tz, "GMT");

   genxAddText(w, "\n ");

   genxStartElement(hr);

    genxAddText(w, "23");

   genxEndElement(w);

   genxAddText(w, "\n ");

   genxStartElement(min);

    genxAddText(w, "14");

   genxEndElement(w);

   genxAddText(w, "\n ");

   genxStartElement(sec);

    genxAddText(w, "52");

   genxEndElement(w);

   genxAddText(w, "\n");

   genxEndElement(w);

  genxEndDocument(w);



}

The second line after main creates a FILE object by calling the fopen function with a filename (tock.xml) where the output is to be written and the stream or writer object (w) from which the data will be supplied. Following that, four elements (time, hr, min, and sec) are declared to be of type genxElement. The attribute tz is declared to be of type genxAttribute, and the namespace tm is declared with genxNamespace. status is of type genxStatus, an enum that helps keep track of the status of things, such as GENX_SUCCESS and GENX_BAD_NAME, and so forth. status is used as the last argument of the functions that follow with the address-of operator &.

After the initial declarations, all these variables are initialized with an appropriate function, genxDeclareNamespace, genxDeclareElement, and genxDeclareAttribute. For example, the namespace variable tm is given a namespace name (http://www.wyeast.net/time) and a prefix (tm) with the genxDeclareNamespace function:


tm = genxDeclareNamespace(w, "http://www.wyeast.net/time",

    "tm", &status);

The genxAddText function inserts strings — an XML declaration and new line characters and spaces — into the file output stream. The addition of the XML declaration is what makes the output non-canonical.

The functions genxPI and genxComment write an XML stylesheet processing instruction and a comment, respectively. Then the functions genxStartElement and genxAddAttribute begin writing the markup. The functions use an object rather than text to write the markup literally, with better performance than their counterparts genxStartElementLiteral and genxAddAttributeLiteral. Other elements, such as genxAddText and genxEndElement, may be used with both variations of the element and attribute creation elements, or just inserting between-elements whitespace, and so on.

To run the program, type the word tock at a command or shell prompt. Genx will then create the file tock.xml, shown here:


<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet  href="tock.xsl" type="text/xsl" ?>

<! the current date -->



<tm:time xmlns:tm="http://www.wyeast.net/time" timezone="GMT">

 <tm:hour>23</tm:hour>

 <tm:minute>14</tm:minute>

 <tm:second>52</tm:second>

<tm:time>

Just for fun, this non-canonical output can be transformed with XSLT stylesheet tock.xsl and validated with the RELAX NG schema tock.rng. Both files are in the example archive.

Wrap Up

There are a number of other Genx functions that I have not touched on — such as the memory management functions genxGetAlloc, genxSetAlloc. My take is that Tim Bray is on the right track, and that if you use C and you need to generate XML output, you will no doubt find that Genx is an efficient tool.