Creating XML with Genx
Genx is an easy-to-use C library for generating well-formed XML output. In addition to being well-formed, Genx writes all output in canoncial form. It was created by Tim Bray with help from members of the xml-dev mail list. Work on Genx was announced on xml-dev on 19 January 2004. Some of the benefits of Genx include size, efficiency, speed, and the integrity of its output. Genx is well documented; it's fairly easy to figure out what's going on just by looking at the well-commented source code.
This article will show you how to download, install, and compile
Genx; then it will walk you through two examples programs. The article
assumes that you are familiar with XML, the C programming language,
and that you have a C compiler and the make build utility
available on your system. The example programs in this article have
been tested under version beta5 of Genx.
The first thing you have to to is download Genx. It comes in a tarball
only. After you download it to a working directory, you need to
extract the files. While at a shell or command prompt, change
directories to a working directory you've set up for Genx. If you are
on a machine that runs a Unix operating system, decompress the Genx
tarball (e.g., gzip -d genx.tgz), then extract the tar
file genx.tar (e.g., tar xvf genx.tar). This
create a genx subdirectory where all the files from the
archive will be extracted. (If you are on Windows without Cygwin, you
can use a utility like
WinZip to extract the GZIP
archive.)
Genx comes with a Makefile for building the
project. While in the genx subdirectory, just
type make, and the process begins. The build will compile
the needed files genx.c
and charProps.c. genx.c includes
the genx.h header file; charProps.c is where
character properties are stored, and it is apparently used to test for
legal characters in XML.
The ar (archive) command is invoked to create an
archive from object files genx.o and
charProps.o The archive is called
libgenx.a. The ranlib utility is also
invoked to create an index for the archive. You will need to use
libgenx.a when you compile your own Genx files. One other
program, tgx.c, is also compiled and run. This program
runs a number of tests on Genx and reports on what it finds so you
know everything is working as intended.
Several test programs are provided in the Genx package and are
stored under the docs subdirectory. I have written a few
additional sample programs that I'll highlight here. You can download these
programs. Place this archive under your genx
subdirectory and extract the contents there to
the genx-examples subdirectory. Change directories
to genx-examples and type make again (the
example archive I've provided also has its
own Makefile). After you invoke make
in genx-examples, the example programs will be built and
ready to go.
First, here is a simple C program called tick.c that
uses functions from the Genx library:
#include <stdio.h>
#include "../genx.h"
int main()
{
genxWriter w = genxNew(NULL, NULL, NULL);
genxStartDocFile(w, stdout);
genxStartElementLiteral(w, NULL, "time");
genxAddAttributeLiteral(w, NULL, "timezone", "GMT");
genxStartElementLiteral(w, NULL, "hour");
genxAddText(w, "23");
genxEndElement(w);
genxStartElementLiteral(w, NULL, "minute");
genxAddText(w, "14");
genxEndElement(w);
genxStartElementLiteral(w, NULL, "second");
genxAddText(w, "52");
genxEndElement(w);
genxEndElement(w);
genxEndDocument(w);
}
The second line of the program is an #include
directive for the copy of the genx.h header file that is
located in the directory above genx-examples, provided
that Genx and the examples were installed as directed. You can also
place a copy of genx.h in the location for system include
files (on my Cygwin system, for example, the location is
c:/cygwin/usr/include). If a copy of
genx.h is in the system include location, you can
change the #include directive to
#include <genx.h>.
The first statement inside main creates a writer for
the output of the program. The variable w is of
type genxWriter, and it's initialized by
the genxNew function. (Looks like a Java constructor,
doesn't it?)
genWriter is a pointer to the struct
genxWriter_rec which stores all kinds of information
about the document being built. The three arguments to the
genxNew function are for memory allocation and
deallocation. When all three arguments are set to
NULL, we are basically instructing Genx to use its
default memory handling (with malloc
and free).
Following this initialization of a writer is a series of function
calls, each with small job. Notice that the first or only argument to
each of these functions is
w, the writer structure. The call to
genxStartDocFile starts the writing process. The second
argument, stdout, indicates that the document will be
written to standard output. (The document could otherwise be written
to a file as you will see in the next example.) At the end of the
program is a call to genxEndDocument which signals the
end of the document and flushes it.
The program also contain four calls to
genxStartElementLiteral each of which is terminated by a
call to genxEndElement.
genxStartElementLiteral has three arguments. The first is
the writer structure (w) explained previously, next is a
namespace name or URI (always NULL here), and the third
is the element name, such as time or
hour.
If you give an element a namespace URI in the second argument, Genx
writes the namespace URI on the element with an xmlns
attribute and automatically creates a prefix, which is used on any
child elements that have the same namespace declared.
The text content for a given element, if any, is created
with genxAddText, with the second argument containing the
actual text, such as 23 or 14.
You can probably guess that genxAddAttributeLiteral
writes an attribute on the element that is created immediately before
it. It has four arguments. The first is the writer structure, and the
second is a namespace URI which is NULL if no namespace
applies. The third argument is the attribute name and the fourth is
the attribute value.
To run the program, just type tick at the prompt (it
was compiled with make previously). The output of the
program should look like this:
<time timezone="GMT"><hour>23</hour><minute>14</minute><second>52</second></time>
This output is canonicalized XML. Some obvious marks are no XML declaration, no whitespace between element tags, and double quotes rather than single quotes around attribute values. Now let's look at a Genx example that is a little more complex.
In the next example we will explore a different approach for
writing an XML document with Genx. The program tock.c
declares elements, an attribute, and a namespace before it
uses them, then writes elements and an attribute with
different functions that are more efficient than their
"literal" counterparts. It also write its non-canonical output
to a file. Here is the code:
#include <stdio.h>
#include "../genx.h"
int main()
{
genxWriter w = genxNew(NULL, NULL, NULL);
FILE *f = fopen("tock.xml", "w");
genxElement time, hr, min, sec;
genxAttribute tz;
genxNamespace tm;
genxStatus status;
tm = genxDeclareNamespace(w, "http://www.wyeast.net/time", "tm", &status);
time = genxDeclareElement(w, tm, "time", &status);
tz = genxDeclareAttribute(w, NULL, "timezone", &status);
hr = genxDeclareElement(w, tm, "hour", &status);
min = genxDeclareElement(w, tm, "minute", &status);
sec = genxDeclareElement(w, tm, "second", &status);
genxAddText(w, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
genxStartDocFile(w, f);
genxPI(w, "xml-stylesheet", " href=\"tock.xsl\" type=\"text/xsl\" ");
genxComment(w, " the current date ");
genxAddText(w, "\n");
genxStartElement(time);
genxAddAttribute(tz, "GMT");
genxAddText(w, "\n ");
genxStartElement(hr);
genxAddText(w, "23");
genxEndElement(w);
genxAddText(w, "\n ");
genxStartElement(min);
genxAddText(w, "14");
genxEndElement(w);
genxAddText(w, "\n ");
genxStartElement(sec);
genxAddText(w, "52");
genxEndElement(w);
genxAddText(w, "\n");
genxEndElement(w);
genxEndDocument(w);
}
The second line after main creates a FILE
object by calling the fopen function with a filename
(tock.xml) where the output is to be written and the
stream or writer object (w) from which the data will be
supplied. Following that, four elements (time,
hr, min, and sec) are declared
to be of type genxElement. The attribute
tz is declared to be of type genxAttribute,
and the namespace tm is declared
with genxNamespace.
status is of type genxStatus,
an enum that helps keep track of the status of things, such as
GENX_SUCCESS and GENX_BAD_NAME, and so
forth.
status is used as the last argument of the functions that
follow with the address-of operator &.
After the initial declarations, all these variables are initialized
with an appropriate function, genxDeclareNamespace,
genxDeclareElement,
and genxDeclareAttribute. For example, the namespace
variable tm is given a namespace name
(http://www.wyeast.net/time) and a prefix
(tm) with the genxDeclareNamespace
function:
tm = genxDeclareNamespace(w, "http://www.wyeast.net/time",
"tm", &status);
The genxAddText function inserts strings — an
XML declaration and new line characters and spaces — into the
file output stream. The addition of the XML declaration is what makes
the output non-canonical.
The functions genxPI and genxComment
write an XML stylesheet processing instruction and a comment,
respectively. Then the functions genxStartElement and
genxAddAttribute begin writing the markup. The functions
use an object rather than text to write the markup literally, with
better performance than their
counterparts genxStartElementLiteral
and genxAddAttributeLiteral. Other elements, such as
genxAddText and genxEndElement, may be used
with both variations of the element and attribute creation elements,
or just inserting between-elements whitespace, and so on.
To run the program, type the word tock at a command or
shell prompt. Genx will then create the file tock.xml,
shown here:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="tock.xsl" type="text/xsl" ?>
<! the current date -->
<tm:time xmlns:tm="http://www.wyeast.net/time" timezone="GMT">
<tm:hour>23</tm:hour>
<tm:minute>14</tm:minute>
<tm:second>52</tm:second>
<tm:time>
Just for fun, this non-canonical output can be transformed with
XSLT stylesheet tock.xsl and validated with the RELAX NG schema
tock.rng. Both files are in the example archive.
There are a number of other Genx functions that I have not touched
on — such as the memory management
functions genxGetAlloc,
genxSetAlloc. My take is that Tim Bray is on the right
track, and that if you use C and you need to generate XML output, you
will no doubt find that Genx is an efficient tool.
XML.com Copyright © 1998-2006 O'Reilly Media, Inc.