XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

An Overview of MSXML 4.0

June 04, 2002

In this article we will look at the latest XML parser from Microsoft, MSXML 4.0. Microsoft has taken a lot of criticism in the past over its adoption of non-standard schema and XSL drafts, which happened most often in early parser releases and even version 3.0 to a lesser extent. However, MSXML 4.0 is a strong attempt by Microsoft to adopt W3C standards. You can now find standard DOM, XPath, Schema, and XSLT implementations in the new parser. There is even full support for SAX 2.0, as well as many other objects that improve both your productivity, when working with XML on both the client and server, and the scalability of server-based XML applications.

Note: To work with the samples in this article you should have MSXML 4.0 installed on your computer, and IE5+ for the purpose of viewing the samples. Unzip the sample files to a directory called "MSXMLFiles".

MSXML DOM

The MSXML DOM implementation exposes an API containing a set of interfaces for loading and parsing documents, working with document nodes, selecting document fragments, as well as dynamic validation against XML Schemas. There are actually two interfaces which may be used when working with an XML DOMDocument in MSXML. The first, IXMLDOMDocument, implements the W3C DOM Level 1 API with some extensions providing support specified in DOM Level 2. The second interface, IXMLDOMDocument2, is actually an extension of the IXMLDOMDocument interface and provides further extensions supporting schema caching and validation, as well as some additional property switches for namespace support and better parser performance. When using MSXML from a scripting language (such as DHTML or ASP), all the methods and properties of these interfaces will be available.

Sample XML document

The following listing shows the basic XML file we'll work with in this article.


<?xml version="1.0" encoding="utf-8" ?>
<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary">
	<product sku="8822N" size="small" type="trouser">
		<it:itinerary>
			<it:sold>120</it:sold>
			<it:onhold>45</it:onhold>
			<it:returned>10</it:returned>
		</it:itinerary>
	</product>
	<product sku="9820Y" size="small" type="tshirt">
		<it:itinerary>
			<it:sold>283</it:sold>
			<it:onhold>232</it:onhold>
			<it:returned>23</it:returned>
		</it:itinerary>
	</product>
	<product sku="9922A" size="large" type="cap">
		<it:itinerary>
			<it:sold>342</it:sold>
			<it:onhold>54</it:onhold>
			<it:returned>5</it:returned>
		</it:itinerary>
	</product>
</lists>

It illustrates a simple product catalog with details like product SKU, size and type of product, all of which are in the http://deltabis.com/products namespace. Also included is itinerary information on what has been sold, what is on hold, and what has been returned, which are in the http://deltabis.com/itinerary namespace.

Loading a document

Loading a document is simple, as demonstrated by the following:


function LoadDocument()
{
	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
	objXML.async = false;
	objXML.load("list.xml");
	document.all("result").value = objXML.xml;
}

There is something new here about working with the MSXML 4.0 parser: all instances of a DOMDocument created must use version dependent progid's. This allows previous versions of MSXML to work side by side and not be affected by the installation of the new parser. To create an MSXML 4.0 version specific instance of the DOMDocument object in JavaScript, you'd write

var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");

or in VBScript,

Set objXML = CreateObject("MSXML2.DOMDocument.4.0");

Qualifying XPath Queries using SelectionNamespaces

The SelectionNamespaces property flag has been available since MSXML 3.0, but it's worth looking at an example of how it can be used in MSXML 4.0. If you look at the sample XML document, you will see there are two namespaces within the document; first, http://deltabis.com/products, the default namespace and, second, http://deltabis.com/itinerary. Imagine you wanted a list of the items sold. The sold element is qualified by different namespaces, and a straight XPath query would only be able to return information not qualified by a namespace (and in the sample document everything is qualified). So the code below will not give us the result we are looking for; rather, it will write out zero as the number of nodes selected.


function GetTShirts()
{
	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
	objXML.async = false;
	objXML.load("list.xml");

	var objNodes = objXML.selectNodes(
"/lists/products[@type='tshirt']/itinerary/sold");
	
	document.all("result").value = objNodes.length;
}

To get round this, the SelectionNamespaces flag is used in combination with the SetProperty() method of the DOMDocument object:


function GetTShirts()
{
	var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
	objXML.async = false;
	objXML.load("list.xml");
	objXML.setProperty("SelectionNamespaces", 
"xmlns:pro='http://deltabis.com/products' xmlns:itin='http://deltabis.com/itinerary'");

	var objNodes = objXML.selectNodes(
"//pro:lists/pro:product[@type='tshirt']/itin:itinerary/itin:sold");

	for (var i=0; i < objNodes.length; i++)
		document.all("result").value += objNodes[i].xml + "\n";
}

Additionally, MSXML 4.0 supports the NewParser value in the setProperty() method, which instructs MSXML to use a parser which offers greater performance, but does not yet support asynchronous mode or DTD validation. Using the new parser can realize a parsing performance improvement of between 200 and 400% for XSLT transformations. Using the new parser is done in the following line of code (the new parser property has been set in sample3.htm):


	objXML.setProperty("NewParser", true); 

We've looked at some of the features of the DOM in MSXML. Let's now look at probably the most significant addition to the MSXML parser: W3C XML Schema support.

W3C XML Schema

Comment on this article Got any questions on this article? Share them with the author and fellow readers in our forum.
Post your comments

To date MSXML parsers have supported XML Data-Reduced (XDR), an implementation based on a W3C NOTE. MSXML 4.0 continues to support XDR, but it now offers W3C XML Schema Definition Language (XSDL) as its recommended schema language. With the huge number of advantages it offers over XDR, you'd be crazy not to use it anyway -- unless you have legacy code, of course. This article doesn't intend to cover XSDL (see XML.com Schemas Resource Center), but we will discuss how you can validate our sample XML file using MSXML 4.0's XSDL support.

There are two ways that instances can be validated against XSDL in MSXML 4.0; either declare the schema at the root of the XML instance or programmatically validate instances against a schema. To perform validation by declaring your schema reference in the XML instance, you must alter the document element of our sample XML to add the schema namespace for instances, which is xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". You must also define the schema files using the schemaLocation attribute (in the schema instance namespace). The value of this attribute is a whitespace delimited list of namespace and schema locations; if there is more than one namespace to be validated, then each pair is in turn separated by whitespace. The following shows how this would look for our sample XML instance.


<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary"
 	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 	xsi:schemaLocation="http://deltabis.com/products sampleSchema.xsd 
http://deltabis.com/itinerary sampleSchema2.xsd"> 

Notice that the schemaLocation attribute has the http://deltabis.com/products namespace separated by a space and then the actual schema file, which is sampleSchema.xsd. And there's a similar entry for the http://deltabis.com/itinerary namespace. You can run this sample by opening sample4.htm. The result is shown in Figure 1 below:


Figure 1 -- Successful validation of a document

The second way we can validate an XML Instance against XSDL using MSXML 4.0 is to use the XMLSchemaCache object, which creates a cache of the XML Schema documents associated with namespace URIs (using the add() method of the XMLSchemaCache object). The schemas property of the DOMDocument can then use this cache to validate a loaded document instance dynamically.

Pages: 1, 2

Next Pagearrow