Menu

MSXML Conformance Update

August 30, 2000

Chris Lovett



Overview

Table of Contents

Overview
The Test Suite
The Test Harness
MSXML Results
Non-validating Mode
Validating Mode
MSXML Conformance Via SAX
Conclusion

This article is an update to previous articles by David Brownell on the conformance of the Microsoft XML Parser (MSXML). The July 2000 MSXML 3.0 Beta Release has made a significant improvement in conformance against the OASIS XML conformance test suite.

Besides reporting the latest OASIS conformance results, this article also reports on the compliance of the new Visual Basic SAX interface included in MSXML 3.0. To run compliance testing on this component I developed a brand new test harness in Visual Basic, which is also included with this article.

The Test Suite

The OASIS XML conformance test suite is a published set of tests, collected over time from various sources, which measure the conformance of XML parsers against the W3C XML 1.0 specification. It does not include any tests from Microsoft at this time.

For my test, I downloaded the updated test suite that David Brownell published in February. This updated test suite takes into account the W3C errata for the XML 1.0 specification. I made two modifications to this suite:

  • Since MSXML has built in support for the W3C Namespaces in XML specification and there is no way to turn off this support, I changed those tests that were clearly not namespace compliant, like valid-sa-012, o-p04pass1, o-p05pass1, and o-p08pass1. (Because of this, I have also e-mailed the OASIS organization suggesting that they add a categorization for namespace compliance. I hope they do.)
  • Some entities are missing in David's February version of the test suite. For example, valid-not-sa-001 contains the a DOCTYPE tag with SYSTEM literal "001.ent", but the entity "001.ent" does not exist. Similarly for valid-not-sa-003, valid-ext-sa-003, and p31pass1.xml. So I created these missing entities (these entities were empty, and accidentally omitted by David due to packaging errors).
  • Test "inv-not-sa03" was marked "invalid" when in fact it is not well formed. Errata E34 says for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity, so I change TYPE attribute on this test to "not-wf".
  • David's version changes the xml:lang tests lang01 through lang06 from "error" to "invalid". I changed these to "valid" because Errata E73 (issued since David's article was written) says The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Unfortunately, MSXML recently changed to check for these, because of the OASIS test suite itself, which was the wrong thing to do. MSXML will be changed back in the next release.

See xmlconf-20000821.zip for my updated version of the test suite.

The Test Harness

I used the same ECMAScript test harness that David Brownell published, except for one minor modification. This modification stemmed from the issue of what to do with tests marked "valid" that have no DTD (document type definition) at all. David's test harness treated this issue in a manner contrary to the design of MSXML.

For XML documents that have no DTD at all, MSXML successfully loads the documents, even when validateOnParse=true. The MSXML API designers feel that the API is more usable this way since you can still load documents that require validation, regardless of whether their DTDs are available. This is an API design issue, and I believe it should not become a conformance issue. Conformance should be about the parser implementation of the XML 1.0 specification, not about how that parser is packaged. If you really want to verify whether a document is validated against a DTD, you need to add the following extra check (shown in bold) to your code:


var doc = new ActiveXObject("MSXML2.DOMDocument.3.0");

doc.validateOnParse = true;

if (doc.load(test) && doc.doctype != null) {

    // ok now we know it is valid and there really was a DTD.

} else {

    // either load failed or there was no DTD.

}

David's harness does not do this extra check. Hence, it reports a whole bunch of failures against MSXML. I added this extra check to my modified test harness so that it uses MSXML correctly.

MSXML Results

Using the test suite and test harness previously described, the July 2000 MSXML Beta Release achieved the following results.

Mode Raw Results Pass Rate
Non-Validating 1016/1071 95%
Validating 1042/1071 97%

A number of failures are due to the fact that, even though validateOnParse=false, MSXML still "processes" the DTDs and external entities so that it can do entity expansion and report default attributes. Whether MSXML loads external entities or not depends on the value of the resolveExternals flag. However, setting this flag to false causes even more failures, because the OASIS tests paradoxically assumes that external DTDs and entities are going to be available even in non-validating mode. The OASIS tests should provide a "standalone" indication in the test descriptions so that we could use this information to turn resolveExternals on or off accordingly.

The only way to fully pass the test suite is to silently ignore problems found in the DTDs. The MSXML team believes (from years of experience with Data Access APIs) that silent failures are generally a bad idea. Clearly, this difference in philosophy needs to be resolved.

Non-validating Mode

To test non-validating mode, you set validateOnParse=false on the MSXML2.DOMDocument.3.0 object. (Note that you don't need to specify the version-dependent ProgID if you install the MSXML Beta Release in replace mode. With MSXML installed in replace mode, you can get identical test results using MSXML.DOMDocument.)

The following command line produces the full non-validating parser report:


cscript harness.js /parser=MSXML2.DOMDocument.3.0 

    /defparser=MSXML2.DOMDocument.3.0 /nvreport=msxml3-nv.html 

    /suite=e:\oasis2000\xmlconf\xmlconf.xml

The following table describes in more detail the failures reported by the test harness.

Bucket Description
Bad Entities (8 Tests) Several tests define entities that cannot legally be expanded, like
 <!ENTITY lt "<"> or entities that simply do not exist like,
 <!ENTITY noop SYSTEM "nop.ent">.
MSXML always expands the entities, even though the document instance does not use them at parse time, they could be used at run time when createEntityReference is called. I believe these tests conflict with a complete DOM implementation. (valid-sa-065, valid-sa-100, pr-xml-little, pr-xml-utf-16, pr-xml-utf-8, ext01, o-p73pass1, o-p75pass1).
Output Tests (3 Tests) Several output tests fail because the above input tests fail. (valid-sa-065, valid-sa-100, ext01).
Attribute-Value Normalization (13 Tests) MSXML 3.0 still does not perform Attribute-Value Normalization as described in the XML specification. (valid-sa-043, valid-sa-058, valid-sa-096, valid-sa-104, valid-sa-108, valid-sa-110, valid-sa-111, not-sa02, not-sa03, not-sa04, notation01, sa02, sa04).
xml:lang (6 Tests) These tests fail because I changed tests lang01 through lang06 from "invalid" to "valid" because Errata E73 says The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Unfortunately, MSXML recently changed to check for these, because of the OASIS test suite itself, which was the wrong thing to do. MSXML will be changed back in the next release.
Doing Validation (25 Tests) MSXML reports some validity constraints even when validateOnParse=false is false, like "Parameter entity not defined", or "The replacement text for a parameter entity must be properly nested with parenthesized groups". (not-wf-not-sa-005, invalid--001, invalid--002, invalid--003 invalid--004, invalid--005, invalid--006, inv-dtd01, inv-dtd02, inv-dtd06, el04, el05, id03, id04, id05, attr04, attr08, attr09, attr10, attr11, attr12, attr13, attr14, attr15, attr16).
Total 55 failures

Validating Mode

To test validating mode, you set validateOnParse=true on the MSXML2.DOMDocument.3.0 object. (Note that you don't need to specify the version-dependent ProgID if you install the MSXML Beta Release in replace mode. With MSXML installed in replace mode, you can get identical test results using MSXML.DOMDocument).

The following command line produces the full validating parser report:

cscript harness.js /parser=MSXML2.DOMDocument.3.0 

    /defparser=MSXML2.DOMDocument.3.0 /vreport=msxml3-val.html 

    /suite=e:\oasis2000\xmlconf\xmlconf.xml

The following table describes in more detail the failures reported by the test harness.

Bucket Description
Bad Entities (7 tests) The same issues as with non-validating mode.
Output Tests (3 tests) Several output tests fail because the above input tests fail. (valid-sa-012, valid-sa-065, valid-sa-100).
Attribute-Value Normalization (13 Tests) The same issues as with non-validating mode.
xml:lang (6 Tests) The same issues as with non-validating mode.
Total 29 failures

MSXML Conformance Via SAX

MSXML 3.0 now includes a SAX interface that you can use from Visual Basic applications. So, I ported David's ECMAScript test harness to Visual Basic, where I could test the compliance of this new component.

The MSXML SAX interface only supports non-validating mode at this time. Given that, the Visual Basic harness (VbSaxTest.zip for source code and executable) produces the same kind of report as David's harness.

Mode Raw Results Pass Rate
Non-Validating 1035/1071 97%

The following table summarizes the detailed report generated by the test harness:

Bucket Description
Non-existent entities (2 Tests) Only one test got bitten by this. (ext01). But it gets counted twice because it causes the corresponding output test to fail also.
Whitespace Normalization (21 Tests) This is mostly the end of line handling (converting 0d 0a pairs into a single 0a).
Doing Validation (5 Tests) These tests fail because the SAX parser is reporting Validation Constraints when it is running in non-validation mode. These are mostly related to the "Proper Declaration/PE Nesting" validity constraint.
xml:lang (6 Tests) The same issues as with non-validating mode.
Total 30 failures

Conclusion

MSXML still has some issues to resolve relating to non-existent or malformed unused entities, attribute-value normalization, end-of-line handling, and reporting validity constraints when running in non-validating mode. However, you can see from the following table that MSXML is on a steady march towards 100% compliance.

Version Non-Validating Validating
MSXML 2.5 87% (931/1067) 83% (895/1067)
MSXML3 May Tech Preview Release 88% (941/1067) 85% (913/1067)
MSXML3 Beta Release 95% (1016/1071) for DOM
97% (1035/1071) for SAX
97% (1042/1071)