Menu

XML Source Highlighting

July 30, 2003

Kyle Downey

Editor's Note: From time to time XML.com authors share with me favorite tricks they use in writing and formatting articles. In this article, Kyle Downey goes one step further and shares his software too.

Adding Source Highlighting to XHTML

They say a successful open source project starts with an itch to scratch. In the case of this package, the itch was my own: I have started producing all of my documentation and other technical writing in XHTML, and I want a way to ensure that

  1. any XML samples included would be valid,
  2. older browsers could view it, and
  3. it would be possible to blend source and documentation without cut-and-paste.

In XML your safest choice for including XML and program source is a CDATA section, which looks something like this: <![CDATA[ ... ]]>. The CDATA directive tells the XML parser to just pull in an unprocessed string until it sees the end marker. Inside you can have left and right brackets galore.

For a portable file-inclusion mechanism that doesn't rely upon native XInclude support in your parser, your best bet is to define an entity, e.g.:


<?xml version="1.0"?>

<!DOCTYPE html [

     <!ENTITY example1 SYSTEM "./example1.java">

]>



Here's the source:



&example1;

            

This will work. Unfortunately, the syntax is not tag-based, unfamiliar to some novice XML users, and -- worst of all -- it puts the association of the import far from its one-time use somewhere in the document.

The other issue with both these mechanisms is that they don't give us much choice about how the browser renders our example code. Since many modern editors -- and even your browser's View Source feature -- do a very nice job of making source code more readable than the <pre> tag, it seems a shame to have our technical documents just have black & white monotype source listings.

What we'd really like would be an XML tag we could add that lets us do either an include of XML:

<highlight:xml href="example.xml"/>

or maybe some Perl code

<highlight:source-code lang="perl" href="example.pl"/>

or even an in-line XML fragment

<highlight:xml>

    <foo>

        <!-- hi, I'm a comment -->

        <bar/>

    </foo>

</highlight:xml>

or Java fragment

<highlight:source-code lang="java">

    public class HelloWorld {

        public void sayHello(String name) {

            System.out.println("hello, " + name);

        }

    }

</highlight:source-code>

Once we have a tag we can process, we can apply additional transformations to that XML node to create much nicer XHTML.

Choices made

Since this is a tool to help with my writing, I didn't want to invest too much time in its development. I decided to try to reuse a source highlighter and just write a thin wrapper class. I also sought out an XML stylesheet that pretty-prints XML source. Putting them together took just a handful of Java classes, and this let me complete the project very quickly and with support for highlighting many more languages than if I'd decided to just roll my own highlighter by throwing a Java grammar at a parser toolkit like SableCC.

Why not an XSLT extension?

As soon as there's some published standard for Java extensions for all XSL processors out there, rather than a smorgasbord of APIs, I'll probably convert this to an XSLT extension. Most of the code that merges the highlighted nodes into the XSLT is surely done much better in the XSL processors out there, and there's no point in duplicating that work.

Command-line highlighting

The first prerequisite for this package is JDK 1.4; I tested with J2SDK 1.4.1_02 on Windows, but any implementation should do.

To process XHTML, next you should put xmltools.jar in your CLASSPATH. You should also download the GNU Source-Highlight package; this code was tested with 1.7. The main site is at gnu.org. Windows users can get binaries from the GNU-Win32 project; make sure source-highlight.exe (or source-highlight) is in your PATH. You will be able to run:


java org.amberarcher.xml.tool.highlight.Main file:input.xhtml output.xhtml

        

to process your marked-up XHTML into plain XHTML. (You can also leave off the output to get output to STDOUT.) Note one quirk of the current version: it requires a URI (file: in the example), not a filename, for the input.

API highlighting

The highlighter does not require the command-line. If you want to do highlighting as part of a larger program (GPL or other open source license required) you can call HighlightEngine:




import javax.xml.transform.Result;

import org.xml.sax.InputSource;

import org.amberarcher.xml.tool.highlight.*;





// set up the engine to use the source-highlight

// processor and the XML stylesheet processor

HighlightEngine he = new HighlightEngine();

he.addHighlighter(new SourceHighlighter());

he.addHighlighter(new XmlHighlighter());



// define a source and a result and process

InputSource src = // ...

Result res = // ...



he.highlight(src, res);

Probably in a later version I'll use this API to create an Ant task to allow document generation to be part of a larger build process.

CSS

If you just highlight a document and open it in your browser, you'll find that there's no color syntax highlighting. To support this, you'll need to define a CSS stylesheet similar to this:


.namespace {

	color: darkblue;

}



.element,

.attribute,

.keyword {

    color: maroon;

    font-weight: bold;

}



.comment {

    color: gray;

    font-style: italic;

}



.string {

    color: green

}



.type {

    color: teal;

    text-decoration: underline;

}

            

For your convenience the source code package includes this in etc/highlight.css.

Relative URIs

All file includes (any href tag) fully support relative URIs. Just drop off the file: prefix, and the processor will create an absolute URI by resolving the included filename relative to the source document's URI.

Language support

As shown in the examples, the XHTML highlighter supports two modes: one for XML, the other for programming languages supported by the source-highlight package. The "lang" attribute is used to specify which set of keywords to use. As of version 1.7 of the source-highlight package, this includes:

  • java (for Java)
  • cpp (for C/C++)
  • prolog (for Prolog)
  • perl (for Perl)
  • php3 (for Php3)
  • python (for Python)
  • flex (for flex)
  • changelog (for ChangeLog)
  • ruby (for Ruby)

License

This software is available freely for use under the GNU GPL. You may also choose to accept this rider if you wish to combine this package with an open source but non-GPL offering:


In addition, as a special exception, Amber Archer Consulting Co., Inc.

gives permission to link the code of this program with any library or

application (or with modified versions of any library or application that

use the same license) available under a license approved by the Open

Source Initiative (http://www.opensource.org), and distribute linked

combinations including the two. You must obey the GNU General Public

License in all respects for all of the code used other than the

aforementioned open source libraries or applications. If you modify this

package, you may extend this exception to your version of the file, but

you are not obligated to do so. If you do not wish to do so, delete

COPYING.RIDER from your version.

            

In other words, if you want to build more open source software with this, you can. If you want to make it part of your commercial offering, you need to comply with the GPL.

Credits