Menu

An Overview of MSXML 4.0

June 4, 2002

Steven Livingstone

In this article we will look at the latest XML parser from Microsoft, MSXML 4.0. Microsoft has taken a lot of criticism in the past over its adoption of non-standard schema and XSL drafts, which happened most often in early parser releases and even version 3.0 to a lesser extent. However, MSXML 4.0 is a strong attempt by Microsoft to adopt W3C standards. You can now find standard DOM, XPath, Schema, and XSLT implementations in the new parser. There is even full support for SAX 2.0, as well as many other objects that improve both your productivity, when working with XML on both the client and server, and the scalability of server-based XML applications.

Note: To work with the samples in this article you should have MSXML 4.0 installed on your computer, and IE5+ for the purpose of viewing the samples. Unzip the sample files to a directory called "MSXMLFiles".

MSXML DOM

The MSXML DOM implementation exposes an API containing a set of interfaces for loading and parsing documents, working with document nodes, selecting document fragments, as well as dynamic validation against XML Schemas. There are actually two interfaces which may be used when working with an XML DOMDocument in MSXML. The first, IXMLDOMDocument, implements the W3C DOM Level 1 API with some extensions providing support specified in DOM Level 2. The second interface, IXMLDOMDocument2, is actually an extension of the IXMLDOMDocument interface and provides further extensions supporting schema caching and validation, as well as some additional property switches for namespace support and better parser performance. When using MSXML from a scripting language (such as DHTML or ASP), all the methods and properties of these interfaces will be available.

Sample XML document

The following listing shows the basic XML file we'll work with in this article.


<?xml version="1.0" encoding="utf-8" ?>

<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary">

	<product sku="8822N" size="small" type="trouser">

		<it:itinerary>

			<it:sold>120</it:sold>

			<it:onhold>45</it:onhold>

			<it:returned>10</it:returned>

		</it:itinerary>

	</product>

	<product sku="9820Y" size="small" type="tshirt">

		<it:itinerary>

			<it:sold>283</it:sold>

			<it:onhold>232</it:onhold>

			<it:returned>23</it:returned>

		</it:itinerary>

	</product>

	<product sku="9922A" size="large" type="cap">

		<it:itinerary>

			<it:sold>342</it:sold>

			<it:onhold>54</it:onhold>

			<it:returned>5</it:returned>

		</it:itinerary>

	</product>

</lists>

It illustrates a simple product catalog with details like product SKU, size and type of product, all of which are in the http://deltabis.com/products namespace. Also included is itinerary information on what has been sold, what is on hold, and what has been returned, which are in the http://deltabis.com/itinerary namespace.

Loading a document

Loading a document is simple, as demonstrated by the following:


function LoadDocument()

{

	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");

	objXML.async = false;

	objXML.load("list.xml");

	document.all("result").value = objXML.xml;

}

There is something new here about working with the MSXML 4.0 parser: all instances of a DOMDocument created must use version dependent progid's. This allows previous versions of MSXML to work side by side and not be affected by the installation of the new parser. To create an MSXML 4.0 version specific instance of the DOMDocument object in JavaScript, you'd write

var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");

or in VBScript,

Set objXML = CreateObject("MSXML2.DOMDocument.4.0");

Qualifying XPath Queries using SelectionNamespaces

The SelectionNamespaces property flag has been available since MSXML 3.0, but it's worth looking at an example of how it can be used in MSXML 4.0. If you look at the sample XML document, you will see there are two namespaces within the document; first, http://deltabis.com/products, the default namespace and, second, http://deltabis.com/itinerary. Imagine you wanted a list of the items sold. The sold element is qualified by different namespaces, and a straight XPath query would only be able to return information not qualified by a namespace (and in the sample document everything is qualified). So the code below will not give us the result we are looking for; rather, it will write out zero as the number of nodes selected.


function GetTShirts()

{

	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");

	objXML.async = false;

	objXML.load("list.xml");



	var objNodes = objXML.selectNodes(

"/lists/products[@type='tshirt']/itinerary/sold");

	

	document.all("result").value = objNodes.length;

}

To get round this, the SelectionNamespaces flag is used in combination with the SetProperty() method of the DOMDocument object:


function GetTShirts()

{

	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");

	objXML.async = false;

	objXML.load("list.xml");

	objXML.setProperty("SelectionNamespaces", 

"xmlns:pro='http://deltabis.com/products' xmlns:itin='http://deltabis.com/itinerary'");



	var objNodes = objXML.selectNodes(

"//pro:lists/pro:product[@type='tshirt']/itin:itinerary/itin:sold");



	for (var i=0; i < objNodes.length; i++)

		document.all("result").value += objNodes[i].xml + "\n";

}

Additionally, MSXML 4.0 supports the NewParser value in the setProperty() method, which instructs MSXML to use a parser which offers greater performance, but does not yet support asynchronous mode or DTD validation. Using the new parser can realize a parsing performance improvement of between 200 and 400% for XSLT transformations. Using the new parser is done in the following line of code (the new parser property has been set in sample3.htm):


	objXML.setProperty("NewParser", true); 

We've looked at some of the features of the DOM in MSXML. Let's now look at probably the most significant addition to the MSXML parser: W3C XML Schema support.

W3C XML Schema

To date MSXML parsers have supported XML Data-Reduced (XDR), an implementation based on a W3C NOTE. MSXML 4.0 continues to support XDR, but it now offers W3C XML Schema Definition Language (XSDL) as its recommended schema language. With the huge number of advantages it offers over XDR, you'd be crazy not to use it anyway -- unless you have legacy code, of course. This article doesn't intend to cover XSDL , but we will discuss how you can validate our sample XML file using MSXML 4.0's XSDL support.

There are two ways that instances can be validated against XSDL in MSXML 4.0; either declare the schema at the root of the XML instance or programmatically validate instances against a schema. To perform validation by declaring your schema reference in the XML instance, you must alter the document element of our sample XML to add the schema namespace for instances, which is xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". You must also define the schema files using the schemaLocation attribute (in the schema instance namespace). The value of this attribute is a whitespace delimited list of namespace and schema locations; if there is more than one namespace to be validated, then each pair is in turn separated by whitespace. The following shows how this would look for our sample XML instance.


<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary"

 	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

 	xsi:schemaLocation="http://deltabis.com/products sampleSchema.xsd 

http://deltabis.com/itinerary sampleSchema2.xsd"> 

Notice that the schemaLocation attribute has the http://deltabis.com/products namespace separated by a space and then the actual schema file, which is sampleSchema.xsd. And there's a similar entry for the http://deltabis.com/itinerary namespace. You can run this sample by opening sample4.htm. The result is shown in Figure 1 below:


Figure 1 -- Successful validation of a document

The second way we can validate an XML Instance against XSDL using MSXML 4.0 is to use the XMLSchemaCache object, which creates a cache of the XML Schema documents associated with namespace URIs (using the add() method of the XMLSchemaCache object). The schemas property of the DOMDocument can then use this cache to validate a loaded document instance dynamically.

The ValidateXML() function of the HTML code used to dynamically validate our XML instance against the sample Schema documents is shown below (sample5.htm).


	function ValidateXML(i)

	{		

		

		//Load XML Schema Documents

		var xmlXSDDoc = new ActiveXObject("Msxml2.DOMDocument.4.0");

		xmlXSDDoc.async = false;

		xmlXSDDoc.load("SampleSchema.xsd");



		var xmlXSDDoc2 = new ActiveXObject("Msxml2.DOMDocument.4.0");

		xmlXSDDoc2.async = false;

		xmlXSDDoc2.load("SampleSchema2.xsd");



		var cache = new ActiveXObject("Msxml2.XMLSchemaCache.4.0");

		cache.add("http://deltabis.com/products",xmlXSDDoc); 

		cache.add("http://deltabis.com/itinerary", xmlXSDDoc2);



		var xmlDoc = new ActiveXObject("Msxml2.DOMDocument.4.0");

		xmlDoc.async = false;

		xmlDoc.schemas = cache;



		if (i==0)

		{

			var strXML = document.all("XMLfile").value;

			bol = xmlDoc.load(strXML);

		}

		else

		{

			var strXML = document.all("XML").value

			bol = xmlDoc.loadXML(strXML);		

		}				

			

		if (xmlDoc.parseError.errorCode != 0) 

		{

			alert(xmlDoc.parseError.reason + "\n" +

 				xmlDoc.parseError.srcText);

		}

		else {

				document.all("XML").value=xmlDoc.xml

				alert("File is valid.");

		}

	}



So, if we modify the value of one of the sold elements from a numeric to a non-numeric value, we will get a validation error message as shown in Figure 2 below:


Figure 2

The ability to validate XML instances against XSDL is the most significant improvement to the MSXML parser.

MSXML 4.0 also contains the Schema Object Model (SOM), which could become very popular. When used with the DOM, SOM gives you access to an XSDL document, allowing you to programmatically interrogate it. At its most basic level, this is a reflection technique, with which you can dynamically generate front end HTML screens with validation or create sub-schemas.

XPath & XSLT

Beyond implementing the standard XPath 1.0 W3C recommendation, MSXML 4.0 also adds extension functions to support XSDL as well as some other miscellaneous functions.

The XSD-related extension functions include the following.

ms:type-is(URI, local-name)

This allows you to compare the data type of the current node against the XSD data type.

For example, the following will return true for a node that is an XSD decimal type.

ms:type-is("www.w3.org/2001/XMLSchema","decimal")

ms:type-local-name([node-set])

This returns the nonqualified name of the XSD type of the current node of the first node of a node-set argument.

ms:type-namespace-uri([node-set])

The returns the namespace URI associated with the current node or first node of the node-set argument.

ms:schema-info-available()

This function will return true if the XSD Schema is available for the current node.

The assorted miscellaneous functions include the following.

ms:string-compare(string1,string2,language,options)

This functions compares string1 and string2 lexicographically (dictionary order) based on the language parameter and case sensitivity defined in the options parameter. It returns -1 if string1<string2, 0 if string1=string2 and 1 if string1>string2.

For example:

ms:string-compare("a", "A", "en-US") returns -1.

ms:utc(string)

Converts data and time values into coordinated universal time (UTC).

ms:namespace-uri(string)

Takes a qualified string and returns the URI of the prefix.

ms:local-name(string)

This function returns the non-qualified name of the XSD type - that is the name without prefix or namespace qualification.

ms:number(string)

Takes an XSD number and converts it to an XPath number.

For example:

ms:number("5.9e5") returns 5.9e5

ms:format-date(datatime, format, locale)

Takes and XSD date and converts it to the date format specified by the format parameter in the specified locale. The format is based on the Win32 API GetDateFormat() method.

ms:format-time(datatime, format, locale)

Takes and XSD time and converts it to the time format specified by the format parameter in the specified locale. The format is based on the Win32 API GetTimeFormat() method.

If you've worked with MSXML, then you'll be well aware of the XSL implementation (based on the December 1998 W3C Working Draft) available with previous parser releases, identified by the namespace http://www.w3.org/TR/WD-xsl. As of MSXML 4.0, this XSL namespace has been completely dropped, which is of critical importance if you plan to use MSXML and have not upgraded your XSL documents to work with the W3C XSLT 1.0 specification.

We can use the extended XPath functions to get information about the XSDL types defined on our XML instance document. This uses the ms:type-is() function within an XSLT document (sample.xsl) and just outputs the number of elements that have been declared an XSDL "int" type within the XML instance document.


<xsl:transform version="1.0"

   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

   xmlns:ms="urn:schemas-microsoft-com:xslt">

  <xsl:template match="/">

There are <xsl:value-of 

select="count(//*[ms:type-is('http://www.w3.org/2001/XMLSchema','int')])" /> 
integer element types in the instance. </xsl:template> </xsl:transform>

In our case, the result is "There are 6 integer element types in the instance.". We can create powerful dynamic front ends and web services with fully featured validation simply by interrogating the SOM for a given XML Schema document.

That's the end of the overview. If you want to learn more, I suggest you download the MSXML 4.0 parser and documentation. You might also be interested in my book, XML Application Development with MSXML 4.0.