XML.com 
 Published on XML.com http://www.xml.com/pub/a/1999/09/conformance/val-analysis.html
See this if you're having trouble printing code examples

 

Validating XML Processors
By David Brownell
September 15, 1999

Contents

Part 1: Conformance Testing for XML Processors
Part 2: Background on Conformance Testing
Part 3: Non-Validating XML Processors
Part 4: Validating XML Processors
Part 5: Summary

Not many validating XML processors are available at this time, and most of them are available with a non-validating sibling. The suppliers are all commercial; there are no Open Source validating processors supporting the SAX API, so far as I am currently aware. Applications needing to enforce document type declarations do have options available in other programming languages. Notably, C/C++ packages are freely available, sometimes with SGML support.

This table provides an alphabetical quick reference to the results of the analysis for validating processors:

Processor Name
and Version
Passed TestsRatingSummary
IBM XML4j
2.0.15 (August 30, 1999)
832

This has the problems of its non-validating sibling, and does not permit validity errors to be continued.

Microsoft MSXML in Java
JVM 5.00.3186 (August 24, 1999)
615

It's curious that this was bundled into Microsoft's Java VM without fixing its well known conformance bugs. Avoid using it.

Oracle XML Parser
2.0.0.2 (August 11, 1999)
871

If this just permitted continuation of validity errors, it would be a top contender.

Sun ``Java Project X''
TR2 (May 21, 1999)
1065

No conformance violations detected.

More detailed discussion of each processor is below, in alphabetical order, with links to the complete testing reports.

IBM XML4j

Processor Name: IBM XML4j
Version: 2.0.15 (August 30, 1999)
Type:Validating
DOM Bundled:Yes
Size of JAR File:722 KBytes (uncompressed)
Download From: http://www.alphaworks.ibm.com/

This is the validating version of IBM's processor. See the coverage of the non-validating processor for more details.

Rating:
Full Test Results: report-xml4j-val.html
Raw Results: Passed 902 (of 1065)
Adjusted Results: Passed 832

The same problems that show up in the non-validating processor also show up in the validating one ... in fact, the processor appears to be doing exactly the same thing in both cases! (I confess this discovery was quite a surprise to me; it may be that this version of the IBM processor is a regression from earlier releases in this respect.)

No validity errors were reported as such; all the invalid documents caused incorrect reports of fatal errors.

Microsoft MSXML in Java

Processor Name: Microsoft MSXML
Version: JVM 5.00.3186 (August 24, 1999)
Type:Validating
DOM Bundled:No
Size of JAR File:N/A (bundled with JVM)
Download From: http://www.microsoft.com/java/
Note that although this parser was originally called MSXML, Microsoft currently uses that term exclusively for its IE5 COM parser ("MSXML.DLL"). The more recent name for the Java parser is the "Microsoft XML Parser in Java". Please do not interpret these results as reflecting conformance for the C parser found in the Internet Explorer 5 web browser.

The MSXML package was originally intended to provide XML support for Internet Explorer 4 users. It was recently bundled with Microsoft's latest version (build 3186) of their Java Virtual Machine and SDK 3.2. A SAX driver is separately available. None of the standard programming interfaces (SAX, DOM) are bundled.

Rating:
Full Test Results: report-msxml.html
Raw Results: Passed 648 (of 1065)
Adjusted Results: Passed 615

This processor needs a separate SAX driver, since Microsoft has not yet offered support for the SAX API. The processor rejects a substantial number of documents that it should accept, producing fatal errors:

In addition, this processor has entered infinite loops when asked to parse some documents. This has been observed with UTF-16 input text (which sometimes produces less drastic errors) as well as with some numeric character references. Such errors are quite dangerous.

Many output tests failed; more than seems usual.

As for documents which should have been rejected but were in fact accepted, there were many of those also:

Support for multiple text encodings seems weak; documents declared as being encoded in "UTF-16" were inappropriately rejected. Japanese encodings were neither rejected nor handled consistently.

This test harness shows that when a SAX ErrorHandler callback is used to report an exception, that exception will not be passed back to the application through the Parser.parse() call. This appears to be a driver issue with a simple fix.

Oracle XML Parser

Processor Name: Oracle XML Parser
Version: 2.0.0.2 (August 11, 1999)
Type:Validating
DOM Bundled:Yes
Size of JAR File:556 KBytes (uncompressed)
Download From: http://www.oracle.com/xml/

This is the validating mode of Oracle's new processor. See the coverage of that non-validating processor.

Rating:
Full Test Results: report-oracle-val.html
Raw Results: Passed 871 (of 1065)
Adjusted Results: Passed 871

This stumbled on accepting a few valid documents, and there were some difficulties handling NMTOKENS attribute lists and handling a mixed content specification.

This shared some of the problems that its non-validating sibling had with reading UTF-16 and multibyte UTF-8 characters. Similarly, it also had problems with names which actually tried to exercise the variety of name characters permitted by the XML specifications.

The output was much more correct than the output from its non-validating sibling. That's a bit puzzling, but it does suggest that the core engine needs only minor tweaks to make sure they're both equally correct.

Also of note is the fact that this validating processor accepted none of the SGML-isms that the non-validating one allowed in its input DTD syntax. Again, the non-validating processor should be acting more like the validating one.

Other than rejecting documents it shouldn't, the problems with this processor mostly related to validity violations that were not reported at all, including:

Internal errors were reported in various cases when validating illegal documents. These included array bounds exceptions (e.g. with NMTOKENS attributes) and null pointer exceptions working with IDREF/IDREFS values.

Significantly, none of the validity errors were continuable; some were correctly reported as non-fatal through SAX, but then the processor refused to continue processing. (As noted above, some were incorrectly reported as fatal errors in the first place.)

Sun ``Java Project X''

Processor Name: Sun ``Java Project X''
Version: TR2 (May 21, 1999)
Type:Validating
DOM Bundled:Yes
Size of JAR File:132 KBytes (or 246 KBytes uncompressed)
Download From: http://java.sun.com/products/xml

This is the validating mode of the non-validating processor presented elsewhere.

Rating:
Full Test Results: report-sun-val.html
Raw Results: Passed 1065 (of 1065)
Adjusted Results: Passed 1065

This processor reported no conformance errors. That was a design goal of the processor.

I analysed the negative results when I worked at Sun, and believe that every diagnostic reports the correct error. (This is the only parser that I can report I have carefully examined for that issue.)

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.