An Overview of MSXML 4.0
June 4, 2002
In this article we will look at the latest XML parser from Microsoft, MSXML 4.0. Microsoft has taken a lot of criticism in the past over its adoption of non-standard schema and XSL drafts, which happened most often in early parser releases and even version 3.0 to a lesser extent. However, MSXML 4.0 is a strong attempt by Microsoft to adopt W3C standards. You can now find standard DOM, XPath, Schema, and XSLT implementations in the new parser. There is even full support for SAX 2.0, as well as many other objects that improve both your productivity, when working with XML on both the client and server, and the scalability of server-based XML applications.
Note: To work with the samples in this article you should have MSXML 4.0 installed on your computer, and IE5+ for the purpose of viewing the samples. Unzip the sample files to a directory called "MSXMLFiles".
MSXML DOM
The MSXML DOM implementation exposes an API containing a set of interfaces for loading
and
parsing documents, working with document nodes, selecting document fragments, as well
as
dynamic validation against XML Schemas. There are actually two interfaces which may
be used
when working with an XML DOMDocument
in MSXML. The first,
IXMLDOMDocument
, implements the W3C DOM Level 1 API with some extensions
providing support specified in DOM Level 2. The second interface,
IXMLDOMDocument2
, is actually an extension of the
IXMLDOMDocument
interface and provides further extensions supporting schema
caching and validation, as well as some additional property switches for namespace
support
and better parser performance. When using MSXML from a scripting language (such as
DHTML or
ASP), all the methods and properties of these interfaces will be available.
Sample XML document
The following listing shows the basic XML file we'll work with in this article.
<?xml version="1.0" encoding="utf-8" ?> <lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary"> <product sku="8822N" size="small" type="trouser"> <it:itinerary> <it:sold>120</it:sold> <it:onhold>45</it:onhold> <it:returned>10</it:returned> </it:itinerary> </product> <product sku="9820Y" size="small" type="tshirt"> <it:itinerary> <it:sold>283</it:sold> <it:onhold>232</it:onhold> <it:returned>23</it:returned> </it:itinerary> </product> <product sku="9922A" size="large" type="cap"> <it:itinerary> <it:sold>342</it:sold> <it:onhold>54</it:onhold> <it:returned>5</it:returned> </it:itinerary> </product> </lists>
It illustrates a simple product catalog with details like product SKU, size and type
of
product, all of which are in the http://deltabis.com/products
namespace. Also
included is itinerary information on what has been sold, what is on hold, and what
has been
returned, which are in the http://deltabis.com/itinerary
namespace.
Loading a document
Loading a document is simple, as demonstrated by the following:
function LoadDocument() { var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0"); objXML.async = false; objXML.load("list.xml"); document.all("result").value = objXML.xml; }
There is something new here about working with the MSXML 4.0 parser: all instances
of a
DOMDocument
created must use version dependent progid's. This allows previous
versions of MSXML to work side by side and not be affected by the installation of
the new
parser. To create an MSXML 4.0 version specific instance of the DOMDocument
object in JavaScript, you'd write
var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
or in VBScript,
Set objXML = CreateObject("MSXML2.DOMDocument.4.0");
Qualifying XPath Queries using SelectionNamespaces
The SelectionNamespaces
property flag has been available since MSXML 3.0, but
it's worth looking at an example of how it can be used in MSXML 4.0. If you look at
the
sample XML document, you will see there are two namespaces within the document; first,
http://deltabis.com/products
, the default namespace and, second,
http://deltabis.com/itinerary
. Imagine you wanted a list of the items sold.
The sold element is qualified by different namespaces, and a straight XPath query
would only
be able to return information not qualified by a namespace (and in the sample document
everything is qualified). So the code below will not give us the result we are looking
for;
rather, it will write out zero as the number of nodes selected.
function GetTShirts() { var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0"); objXML.async = false; objXML.load("list.xml"); var objNodes = objXML.selectNodes( "/lists/products[@type='tshirt']/itinerary/sold"); document.all("result").value = objNodes.length; }
To get round this, the SelectionNamespaces
flag is used in combination with
the SetProperty()
method of the DOMDocument
object:
function GetTShirts() { var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0"); objXML.async = false; objXML.load("list.xml"); objXML.setProperty("SelectionNamespaces", "xmlns:pro='http://deltabis.com/products' xmlns:itin='http://deltabis.com/itinerary'"); var objNodes = objXML.selectNodes( "//pro:lists/pro:product[@type='tshirt']/itin:itinerary/itin:sold"); for (var i=0; i < objNodes.length; i++) document.all("result").value += objNodes[i].xml + "\n"; }
Additionally, MSXML 4.0 supports the NewParser
value in the
setProperty()
method, which instructs MSXML to use a parser which offers
greater performance, but does not yet support asynchronous mode or DTD validation.
Using the
new parser can realize a parsing performance improvement of between 200 and 400% for
XSLT
transformations. Using the new parser is done in the following line of code (the new
parser
property has been set in sample3.htm):
objXML.setProperty("NewParser", true);
We've looked at some of the features of the DOM in MSXML. Let's now look at probably the most significant addition to the MSXML parser: W3C XML Schema support.
W3C XML Schema
To date MSXML parsers have supported XML Data-Reduced (XDR), an implementation based on a W3C NOTE. MSXML 4.0 continues to support XDR, but it now offers W3C XML Schema Definition Language (XSDL) as its recommended schema language. With the huge number of advantages it offers over XDR, you'd be crazy not to use it anyway -- unless you have legacy code, of course. This article doesn't intend to cover XSDL , but we will discuss how you can validate our sample XML file using MSXML 4.0's XSDL support.
There are two ways that instances can be validated against XSDL in MSXML 4.0; either
declare the schema at the root of the XML instance or programmatically validate instances
against a schema. To perform validation by declaring your schema reference in the
XML
instance, you must alter the document element of our sample XML to add the schema
namespace
for instances, which is xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
.
You must also define the schema files using the schemaLocation
attribute (in
the schema instance namespace). The value of this attribute is a whitespace delimited
list
of namespace and schema locations; if there is more than one namespace to be validated,
then
each pair is in turn separated by whitespace. The following shows how this would look
for
our sample XML instance.
<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://deltabis.com/products sampleSchema.xsd http://deltabis.com/itinerary sampleSchema2.xsd">
Notice that the schemaLocation
attribute has the
http://deltabis.com/products
namespace separated by a space and then the
actual schema file, which is sampleSchema.xsd. And there's a similar entry for the
http://deltabis.com/itinerary
namespace. You can run this sample by opening
sample4.htm. The result is shown in Figure 1 below:

The second way we can validate an XML Instance against XSDL using MSXML 4.0 is to
use the
XMLSchemaCache
object, which creates a cache of the XML Schema documents
associated with namespace URIs (using the add()
method of the
XMLSchemaCache
object). The schemas property of the DOMDocument
can then use this cache to validate a loaded document instance dynamically.
The ValidateXML()
function of the HTML code used to dynamically validate our
XML instance against the sample Schema documents is shown below (sample5.htm).
function ValidateXML(i) { //Load XML Schema Documents var xmlXSDDoc = new ActiveXObject("Msxml2.DOMDocument.4.0"); xmlXSDDoc.async = false; xmlXSDDoc.load("SampleSchema.xsd"); var xmlXSDDoc2 = new ActiveXObject("Msxml2.DOMDocument.4.0"); xmlXSDDoc2.async = false; xmlXSDDoc2.load("SampleSchema2.xsd"); var cache = new ActiveXObject("Msxml2.XMLSchemaCache.4.0"); cache.add("http://deltabis.com/products",xmlXSDDoc); cache.add("http://deltabis.com/itinerary", xmlXSDDoc2); var xmlDoc = new ActiveXObject("Msxml2.DOMDocument.4.0"); xmlDoc.async = false; xmlDoc.schemas = cache; if (i==0) { var strXML = document.all("XMLfile").value; bol = xmlDoc.load(strXML); } else { var strXML = document.all("XML").value bol = xmlDoc.loadXML(strXML); } if (xmlDoc.parseError.errorCode != 0) { alert(xmlDoc.parseError.reason + "\n" + xmlDoc.parseError.srcText); } else { document.all("XML").value=xmlDoc.xml alert("File is valid."); } }
So, if we modify the value of one of the sold elements from a numeric to a non-numeric value, we will get a validation error message as shown in Figure 2 below:

The ability to validate XML instances against XSDL is the most significant improvement to the MSXML parser.
MSXML 4.0 also contains the Schema Object Model (SOM), which could become very popular. When used with the DOM, SOM gives you access to an XSDL document, allowing you to programmatically interrogate it. At its most basic level, this is a reflection technique, with which you can dynamically generate front end HTML screens with validation or create sub-schemas.
XPath & XSLT
Beyond implementing the standard XPath 1.0 W3C recommendation, MSXML 4.0 also adds extension functions to support XSDL as well as some other miscellaneous functions.
The XSD-related extension functions include the following.
-
ms:type-is(URI, local-name)
-
This allows you to compare the data type of the current node against the XSD data type.
For example, the following will return true for a node that is an XSD decimal type.
ms:type-is("www.w3.org/2001/XMLSchema","decimal")
-
ms:type-local-name([node-set])
-
This returns the nonqualified name of the XSD type of the current node of the first node of a node-set argument.
-
ms:type-namespace-uri([node-set])
-
The returns the namespace URI associated with the current node or first node of the node-set argument.
-
ms:schema-info-available()
-
This function will return true if the XSD Schema is available for the current node.
The assorted miscellaneous functions include the following.
-
ms:string-compare(string1,string2,language,options)
-
This functions compares
string1
andstring2
lexicographically (dictionary order) based on the language parameter and case sensitivity defined in the options parameter. It returns-1
ifstring1<string2
,0
ifstring1=string2
and1
ifstring1>string2
.For example:
ms:string-compare("a", "A", "en-US") returns -1.
-
ms:utc(string)
-
Converts data and time values into coordinated universal time (UTC).
-
ms:namespace-uri(string)
-
Takes a qualified string and returns the URI of the prefix.
-
ms:local-name(string)
-
This function returns the non-qualified name of the XSD type - that is the name without prefix or namespace qualification.
-
ms:number(string)
-
Takes an XSD number and converts it to an XPath number.
For example:
ms:number("5.9e5")
returns5.9e5
-
ms:format-date(datatime, format, locale)
-
Takes and XSD date and converts it to the date format specified by the format parameter in the specified locale. The format is based on the Win32 API
GetDateFormat()
method. -
ms:format-time(datatime, format, locale)
-
Takes and XSD time and converts it to the time format specified by the format parameter in the specified locale. The format is based on the Win32 API
GetTimeFormat()
method.
If you've worked with MSXML, then you'll be well aware of the XSL implementation (based
on
the December 1998 W3C Working Draft) available with previous parser releases, identified
by
the namespace http://www.w3.org/TR/WD-xsl
. As of MSXML 4.0, this XSL namespace
has been completely dropped, which is of critical importance if you plan to use MSXML
and
have not upgraded your XSL documents to work with the W3C XSLT 1.0 specification.
We can use the extended XPath functions to get information about the XSDL types defined
on
our XML instance document. This uses the ms:type-is()
function within an XSLT
document (sample.xsl) and just outputs the number of elements that have been declared
an
XSDL "int
" type within the XML instance document.
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ms="urn:schemas-microsoft-com:xslt"> <xsl:template match="/"> There are <xsl:value-of select="count(//*[ms:type-is('http://www.w3.org/2001/XMLSchema','int')])" />
integer element types in the instance. </xsl:template> </xsl:transform>
In our case, the result is "There are 6 integer element types in the
instance.
". We can create powerful dynamic front ends and web services with fully
featured validation simply by interrogating the SOM for a given XML Schema document.
That's the end of the overview. If you want to learn more, I suggest you download the MSXML 4.0 parser and documentation. You might also be interested in my book, XML Application Development with MSXML 4.0.