An Overview of MSXML 4.0
In this article we will look at the latest XML parser from Microsoft, MSXML 4.0. Microsoft has taken a lot of criticism in the past over its adoption of non-standard schema and XSL drafts, which happened most often in early parser releases and even version 3.0 to a lesser extent. However, MSXML 4.0 is a strong attempt by Microsoft to adopt W3C standards. You can now find standard DOM, XPath, Schema, and XSLT implementations in the new parser. There is even full support for SAX 2.0, as well as many other objects that improve both your productivity, when working with XML on both the client and server, and the scalability of server-based XML applications.
Note: To work with the samples in this article you should have MSXML 4.0 installed on your computer, and IE5+ for the purpose of viewing the samples. Unzip the sample files to a directory called "MSXMLFiles".
MSXML DOM
The MSXML DOM implementation exposes an API containing a set of interfaces
for loading and parsing documents, working with document nodes, selecting
document fragments, as well as dynamic validation against XML Schemas. There are
actually two interfaces which may be used when working with an XML
DOMDocument in MSXML. The first, IXMLDOMDocument,
implements the W3C DOM Level 1 API with some extensions providing support
specified in DOM Level 2. The second interface, IXMLDOMDocument2,
is actually an extension of the IXMLDOMDocument interface and
provides further extensions supporting schema caching and validation, as well as
some additional property switches for namespace support and better parser
performance. When using MSXML from a scripting language (such as DHTML or ASP),
all the methods and properties of these interfaces will be available.
Sample XML document
The following listing shows the basic XML file we'll work with in this article.
<?xml version="1.0" encoding="utf-8" ?>
<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary">
<product sku="8822N" size="small" type="trouser">
<it:itinerary>
<it:sold>120</it:sold>
<it:onhold>45</it:onhold>
<it:returned>10</it:returned>
</it:itinerary>
</product>
<product sku="9820Y" size="small" type="tshirt">
<it:itinerary>
<it:sold>283</it:sold>
<it:onhold>232</it:onhold>
<it:returned>23</it:returned>
</it:itinerary>
</product>
<product sku="9922A" size="large" type="cap">
<it:itinerary>
<it:sold>342</it:sold>
<it:onhold>54</it:onhold>
<it:returned>5</it:returned>
</it:itinerary>
</product>
</lists>
It illustrates a simple product catalog with details like product SKU, size
and type of product, all of which are in the
http://deltabis.com/products namespace. Also included is itinerary
information on what has been sold, what is on hold, and what has been returned,
which are in the http://deltabis.com/itinerary namespace.
Loading a document
Loading a document is simple, as demonstrated by the following:
function LoadDocument()
{
var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
objXML.async = false;
objXML.load("list.xml");
document.all("result").value = objXML.xml;
}
There is something new here about working with the MSXML 4.0 parser: all
instances of a DOMDocument created must use version dependent
progid's. This allows previous versions of MSXML to work side by side and not be
affected by the installation of the new parser. To create an MSXML 4.0 version
specific instance of the DOMDocument object in JavaScript, you'd
write
var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
or in VBScript,
Set objXML = CreateObject("MSXML2.DOMDocument.4.0");
Qualifying XPath Queries using SelectionNamespaces
The SelectionNamespaces property flag has been available since
MSXML 3.0, but it's worth looking at an example of how it can be used in MSXML
4.0. If you look at the sample XML document, you will see there are two
namespaces within the document; first,
http://deltabis.com/products, the default namespace and, second,
http://deltabis.com/itinerary. Imagine you wanted a list of the
items sold. The sold element is qualified by different namespaces, and a
straight XPath query would only be able to return information not qualified by a
namespace (and in the sample document everything is qualified). So the code
below will not give us the result we are looking for; rather, it will write out
zero as the number of nodes selected.
function GetTShirts()
{
var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
objXML.async = false;
objXML.load("list.xml");
var objNodes = objXML.selectNodes(
"/lists/products[@type='tshirt']/itinerary/sold");
document.all("result").value = objNodes.length;
}
To get round this, the SelectionNamespaces flag is used in
combination with the SetProperty() method of the
DOMDocument object:
function GetTShirts()
{
var objXML = new ActiveXObject("MSXML2.DOMDocument.4.0");
objXML.async = false;
objXML.load("list.xml");
objXML.setProperty("SelectionNamespaces",
"xmlns:pro='http://deltabis.com/products' xmlns:itin='http://deltabis.com/itinerary'");
var objNodes = objXML.selectNodes(
"//pro:lists/pro:product[@type='tshirt']/itin:itinerary/itin:sold");
for (var i=0; i < objNodes.length; i++)
document.all("result").value += objNodes[i].xml + "\n";
}
Additionally, MSXML 4.0 supports the NewParser value in the
setProperty() method, which instructs MSXML to use a parser which
offers greater performance, but does not yet support asynchronous mode or DTD
validation. Using the new parser can realize a parsing performance improvement
of between 200 and 400% for XSLT transformations. Using the new parser is done
in the following line of code (the new parser property has been set in
sample3.htm):
objXML.setProperty("NewParser", true);
We've looked at some of the features of the DOM in MSXML. Let's now look at probably the most significant addition to the MSXML parser: W3C XML Schema support.
W3C XML Schema
|
|
| Post your comments |
To date MSXML parsers have supported XML Data-Reduced (XDR), an implementation based on a W3C NOTE. MSXML 4.0 continues to support XDR, but it now offers W3C XML Schema Definition Language (XSDL) as its recommended schema language. With the huge number of advantages it offers over XDR, you'd be crazy not to use it anyway -- unless you have legacy code, of course. This article doesn't intend to cover XSDL (see XML.com Schemas Resource Center), but we will discuss how you can validate our sample XML file using MSXML 4.0's XSDL support.
There are two ways that instances can be validated against XSDL in MSXML 4.0;
either declare the schema at the root of the XML instance or programmatically
validate instances against a schema. To perform validation by declaring your
schema reference in the XML instance, you must alter the document element of our
sample XML to add the schema namespace for instances, which is
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance". You must
also define the schema files using the schemaLocation attribute (in
the schema instance namespace). The value of this attribute is a whitespace
delimited list of namespace and schema locations; if there is more than one
namespace to be validated, then each pair is in turn separated by
whitespace. The following shows how this would look for our sample XML
instance.
<lists xmlns="http://deltabis.com/products" xmlns:it="http://deltabis.com/itinerary"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://deltabis.com/products sampleSchema.xsd
http://deltabis.com/itinerary sampleSchema2.xsd">
Notice that the schemaLocation attribute has the
http://deltabis.com/products namespace separated by a space and
then the actual schema file, which is sampleSchema.xsd. And there's a similar
entry for the http://deltabis.com/itinerary namespace. You can run
this sample by opening sample4.htm. The result is shown in Figure 1 below:

The second way we can validate an XML Instance against XSDL using MSXML 4.0
is to use the XMLSchemaCache object, which creates a cache of the
XML Schema documents associated with namespace URIs (using the
add() method of the XMLSchemaCache object). The
schemas property of the DOMDocument can then use this cache to
validate a loaded document instance dynamically.
Pages: 1, 2 |