XML.com 
 Published on XML.com http://www.xml.com/pub/a/2000/08/30/msxml/index.html
See this if you're having trouble printing code examples

 

MSXML Conformance Update
By Chris Lovett
August 30, 2000


Overview

Table of Contents

Overview
The Test Suite
The Test Harness
MSXML Results
Non-validating Mode
Validating Mode
MSXML Conformance Via SAX
Conclusion

This article is an update to previous articles by David Brownell on the conformance of the Microsoft XML Parser (MSXML). The July 2000 MSXML 3.0 Beta Release has made a significant improvement in conformance against the OASIS XML conformance test suite.

Besides reporting the latest OASIS conformance results, this article also reports on the compliance of the new Visual Basic SAX interface included in MSXML 3.0. To run compliance testing on this component I developed a brand new test harness in Visual Basic, which is also included with this article.

The Test Suite

The OASIS XML conformance test suite is a published set of tests, collected over time from various sources, which measure the conformance of XML parsers against the W3C XML 1.0 specification. It does not include any tests from Microsoft at this time.

For my test, I downloaded the updated test suite that David Brownell published in February. This updated test suite takes into account the W3C errata for the XML 1.0 specification. I made two modifications to this suite:

See xmlconf-20000821.zip for my updated version of the test suite.

The Test Harness

I used the same ECMAScript test harness that David Brownell published, except for one minor modification. This modification stemmed from the issue of what to do with tests marked "valid" that have no DTD (document type definition) at all. David's test harness treated this issue in a manner contrary to the design of MSXML.

For XML documents that have no DTD at all, MSXML successfully loads the documents, even when validateOnParse=true. The MSXML API designers feel that the API is more usable this way since you can still load documents that require validation, regardless of whether their DTDs are available. This is an API design issue, and I believe it should not become a conformance issue. Conformance should be about the parser implementation of the XML 1.0 specification, not about how that parser is packaged. If you really want to verify whether a document is validated against a DTD, you need to add the following extra check (shown in bold) to your code:

var doc = new ActiveXObject("MSXML2.DOMDocument.3.0");
doc.validateOnParse = true;
if (doc.load(test) && doc.doctype != null) {
    // ok now we know it is valid and there really was a DTD.
} else {
    // either load failed or there was no DTD.
}

David's harness does not do this extra check. Hence, it reports a whole bunch of failures against MSXML. I added this extra check to my modified test harness so that it uses MSXML correctly.

MSXML Results

Using the test suite and test harness previously described, the July 2000 MSXML Beta Release achieved the following results.

ModeRaw ResultsPass Rate
Non-Validating1016/107195%
Validating1042/107197%

A number of failures are due to the fact that, even though validateOnParse=false, MSXML still "processes" the DTDs and external entities so that it can do entity expansion and report default attributes. Whether MSXML loads external entities or not depends on the value of the resolveExternals flag. However, setting this flag to false causes even more failures, because the OASIS tests paradoxically assumes that external DTDs and entities are going to be available even in non-validating mode. The OASIS tests should provide a "standalone" indication in the test descriptions so that we could use this information to turn resolveExternals on or off accordingly.

The only way to fully pass the test suite is to silently ignore problems found in the DTDs. The MSXML team believes (from years of experience with Data Access APIs) that silent failures are generally a bad idea. Clearly, this difference in philosophy needs to be resolved.

Non-validating Mode

To test non-validating mode, you set validateOnParse=false on the MSXML2.DOMDocument.3.0 object. (Note that you don't need to specify the version-dependent ProgID if you install the MSXML Beta Release in replace mode. With MSXML installed in replace mode, you can get identical test results using MSXML.DOMDocument.)

The following command line produces the full non-validating parser report:

cscript harness.js /parser=MSXML2.DOMDocument.3.0 
    /defparser=MSXML2.DOMDocument.3.0 /nvreport=msxml3-nv.html 
    /suite=e:\oasis2000\xmlconf\xmlconf.xml

The following table describes in more detail the failures reported by the test harness.

BucketDescription
Bad Entities (8 Tests) Several tests define entities that cannot legally be expanded, like
 <!ENTITY lt "<"> or entities that simply do not exist like,
 <!ENTITY noop SYSTEM "nop.ent">.
MSXML always expands the entities, even though the document instance does not use them at parse time, they could be used at run time when createEntityReference is called. I believe these tests conflict with a complete DOM implementation. (valid-sa-065, valid-sa-100, pr-xml-little, pr-xml-utf-16, pr-xml-utf-8, ext01, o-p73pass1, o-p75pass1).
Output Tests (3 Tests) Several output tests fail because the above input tests fail. (valid-sa-065, valid-sa-100, ext01).
Attribute-Value Normalization (13 Tests) MSXML 3.0 still does not perform Attribute-Value Normalization as described in the XML specification. (valid-sa-043, valid-sa-058, valid-sa-096, valid-sa-104, valid-sa-108, valid-sa-110, valid-sa-111, not-sa02, not-sa03, not-sa04, notation01, sa02, sa04).
xml:lang (6 Tests) These tests fail because I changed tests lang01 through lang06 from "invalid" to "valid" because Errata E73 says The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Unfortunately, MSXML recently changed to check for these, because of the OASIS test suite itself, which was the wrong thing to do. MSXML will be changed back in the next release.
Doing Validation (25 Tests) MSXML reports some validity constraints even when validateOnParse=false is false, like "Parameter entity not defined", or "The replacement text for a parameter entity must be properly nested with parenthesized groups". (not-wf-not-sa-005, invalid--001, invalid--002, invalid--003 invalid--004, invalid--005, invalid--006, inv-dtd01, inv-dtd02, inv-dtd06, el04, el05, id03, id04, id05, attr04, attr08, attr09, attr10, attr11, attr12, attr13, attr14, attr15, attr16).
Total 55 failures

Validating Mode

To test validating mode, you set validateOnParse=true on the MSXML2.DOMDocument.3.0 object. (Note that you don't need to specify the version-dependent ProgID if you install the MSXML Beta Release in replace mode. With MSXML installed in replace mode, you can get identical test results using MSXML.DOMDocument).

The following command line produces the full validating parser report:

cscript harness.js /parser=MSXML2.DOMDocument.3.0 
    /defparser=MSXML2.DOMDocument.3.0 /vreport=msxml3-val.html 
    /suite=e:\oasis2000\xmlconf\xmlconf.xml

The following table describes in more detail the failures reported by the test harness.

BucketDescription
Bad Entities (7 tests) The same issues as with non-validating mode.
Output Tests (3 tests) Several output tests fail because the above input tests fail. (valid-sa-012, valid-sa-065, valid-sa-100).
Attribute-Value Normalization (13 Tests) The same issues as with non-validating mode.
xml:lang (6 Tests) The same issues as with non-validating mode.
Total 29 failures

MSXML Conformance Via SAX

MSXML 3.0 now includes a SAX interface that you can use from Visual Basic applications. So, I ported David's ECMAScript test harness to Visual Basic, where I could test the compliance of this new component.

The MSXML SAX interface only supports non-validating mode at this time. Given that, the Visual Basic harness (VbSaxTest.zip for source code and executable) produces the same kind of report as David's harness.

ModeRaw ResultsPass Rate
Non-Validating1035/107197%

The following table summarizes the detailed report generated by the test harness:

BucketDescription
Non-existent entities (2 Tests)Only one test got bitten by this. (ext01). But it gets counted twice because it causes the corresponding output test to fail also.
Whitespace Normalization (21 Tests)This is mostly the end of line handling (converting 0d 0a pairs into a single 0a).
Doing Validation (5 Tests)These tests fail because the SAX parser is reporting Validation Constraints when it is running in non-validation mode. These are mostly related to the "Proper Declaration/PE Nesting" validity constraint.
xml:lang (6 Tests) The same issues as with non-validating mode.
Total 30 failures

Conclusion

MSXML still has some issues to resolve relating to non-existent or malformed unused entities, attribute-value normalization, end-of-line handling, and reporting validity constraints when running in non-validating mode. However, you can see from the following table that MSXML is on a steady march towards 100% compliance.

VersionNon-ValidatingValidating
MSXML 2.5 87% (931/1067)83% (895/1067)
MSXML3 May Tech Preview Release88% (941/1067)85% (913/1067)
MSXML3 Beta Release95% (1016/1071) for DOM
97% (1035/1071) for SAX
97% (1042/1071)

XML.com Copyright © 1998-2006 O'Reilly Media, Inc.