XML and Visual Basic
The Odd Couple
Table of Contents
Visual Basic is probably the most commonly used programming language in the world. Certainly Microsoft's "middle-tier" language has managed to become a powerful force for the development of client/server applications, utilities, and similar applications. In great part this is because it works on the assumption that most people do not want to spend a week and several thousand lines of code to produce "Hello, World!" programs--they want something "quick and dirty" that hides the complexity of programming and lets them concentrate on the business side of things.
VB6, the most recent version of Visual Basic, which came out as part of Microsoft's Visual Studio package, appeared on the landscape in early 1998. At the time, another language was also in the works--XML. There was almost no acknowledgement by Visual Basic that XML was a force to be reckoned with (although VB6 "web classes" created a simple form of what's now recognized as a SAX based parser, though the creators at the time hadn't been aiming to do that). As such, Visual Basic's native support for XML is non-existent--no component can read or produce it, no persistence mechanisms exist around it, no data engines work with it.
However, if that really were the extent of interaction between XML and VB, then the need for an article on the two together would never have arisen! Despite not supporting it natively, VB has proven to be a remarkably effective tool for working with XML. The reason for this is actually pretty simple: the MSXML parser exposes nearly all of the same interfaces in Visual Basic as it does in C++, and the parser that it does expose is arguably one of the best on the market.
A clarification is in order here. Before getting jeered off the stage, I want to note that I'm focusing on the most recent parser, the Microsoft Web Release MSXML3 parser, which is currently still in beta. If you have not yet downloaded this toolkit, do so now! While it won't support all of the features implemented in the W3C specification until its final release later this fall, it implements enough of them that you can begin to see the real power of the W3C XSLT recommendation, XPath, schemas, and more. The older parser relies on specifications that are themselves more than a year out of date (a glacial age in this industry), and the new parser is so superior (even in beta) that there really is no comparison. As you may have guessed, it's also the code that's used in this article.
VB, DOM, and XSLT
The Microsoft parser was one of the first to utilize the notion of a document object model for XML, which in turn went a long way toward establishing the XML Document Object Model for the W3C. While the two (even now) differ somewhat in the capabilities offered, there is enough similarity that anyone familiar with the Java model will be able to jump to the MSXML model (and vice versa).
In Visual Basic, in order to be able to link in a library such as MSXML, you create a reference to it by selecting Project-->References in the development environment and choosing the relevant DLL or EXE file. In the case of the new MSXML parser, you would select the entry "Microsoft XML, v3.0". This makes the MSXML3.dll class library available for use in the development environment, which contains all of the Microsoft XML functionality.
The DOM itself is built around five primary pieces--documents, nodes, nodelists, elements, and attributes. Each of these has their own unique programmatic interfaces (specifically, DOMDocument, IXMLDOMNode, IXMLDOMNodeList, IXMLDOMElement, and IXMLDOMattribute), each of which expose a number of methods and properties. For example, the DOMDocument interface handles both the loading and saving of XML files (through .load, .loadXML--which converts an XML structured string into an internal DOMDocument--and .save), while the IXMLDOMElement interface handles referencing attributes, content text, and element children, and provides a starting point for generating XPath queries.
XPath support is a subtle part of the Microsoft DOM that isn't supported in the current W3C DOM specification, but it plays a large part in the versatility of the MSXML parser when dealing with data. XPath is probably most familiar as the language used by XSLT to select nodes within a given XML tree--in essence, to create node-sets. The terminology was not yet nailed down when Microsoft's DOM was first proposed, so Microsoft referred to these node-sets as "nodeLists." The parser supports an IXMLDOMNodeList interface that can be used to reference a list of nodes, as well as the new IXMLDOMSelection, which serves as a way to deal with the nodes as a single unit.
The ability to pull an arbitrary selection of nodes out of an XML DOM is what makes the parser so powerful, and the fact that the MSXML parser can perform XPath queries on a set of data means that it can effectively reference any nodes within the XML tree, based upon (sometimes highly sophisticated) queries. The parser contains two functions to do this: .selectNodes() and .selectSingleNode(), which retrieve a selection of nodes and the first-found node that satisfies the XPath query, relative to the node in which the function is called.
An example is perhaps in order here. Consider a very simple application that can read an XML document, populate a list box with the titles of each section, then, when a section is clicked, output the section's text into a text box. While perhaps not very exciting, this example illustrates how you could build more complex applications that take advantage of the XPath strategy to retrieve information.
The XML document is retained as a public variable called articleDoc, which has been initialized as a DOM interface DOMDocument (dim articleDoc as New DOMDocument).
Figure 1: Application User Interface
The application itself uses a simple menu, a list box, a text box, and the Common Dialog control for selecting the XML file in the first place, as laid out in Figure 1. When a user selects Open from the menu, the onMnuFileOpen event gets called, which in turn calls the function LoadDocument(). This particular routine first sets the asynchronous property of the DOM to false so that the XML document loads synchronously--execution pauses until either the document is completely loaded in memory, or an error occurs. Then it loads the requested document and sets the caption of the form (the title bar of the window) to be the title of the document:
Public articleDoc As New DOMDocument Private Sub mnuFileOpen_Click() cdlg.Filter = "XML Files (*.xml,*.xsl)|*.xml;*xsl" cdlg.ShowOpen If cdlg.FileName <> "" Then LoadDocument cdlg.FileName End If End Sub Function LoadDocument(url As String) articleDoc.async = False articleDoc.Load url Me.Caption = articleDoc.selectSingleNode(_ "//head/title").Text loadSelections End Function
Notice here the use of selectSingleNode() to retrieve the title of the head element, where the title for the document itself is contained. The .text method retrieves the text contents of the element's subtree.
Once the document is loaded, the loadSelections routine gets called. This actually populates the list box, and demonstrates the use of selectNodes for retrieving specific node information:
Public Function loadSelections() Dim sections As IXMLDOMNodeList Dim sectionNode As IXMLDOMNode Set sections = articleDoc.selectNodes("//section") SectionsList.Clear For Each sectionNode In sections SectionsList.AddItem_ section.selectSingleNode("title").Text Next SectionsList.ListIndex = 0 End Function
The selectNodes method retrieves a node list consisting of pointers to each of the section nodes. The list control (called SectionsList) is then cleared, and VB loops through each section node in turn, adding to the control the text of the title associated with the section node. Finally, the routine selects the zeroeth index (the first position in the list), which in turn fires the Click event for the SectionsList control:
Private Sub SectionsList_Click() DisplayContents SectionsList.Text End Sub Public Function DisplayContents(sectionName As String) Dim sectonContent As String Dim sectionNode As IXMLDOMElement Dim buffer As String Dim subElt As IXMLDOMElement With SectionsList sectionName = .Text Set sectionNode = articleDoc.selectSingleNode(_ "//section[title='" + sectionName + "']") For Each subElt In sectionNode.selectNodes("*") buffer = buffer + subElt.Text + vbCrLf + vbCrLf Next SectionContentText.Text = buffer End With End Function
This example serves as a good demonstration of the power of XPath. The title of each section should be distinct. As a consequence, it is possible to uniquely identify a given section by its title, then to use selectSingleNode() to retrieve that node. It should be noted that this actually echoes an equivalent paradigm in XSLT:
<xsl:param name="$sectionName"/> <xsl:template match="section[title=$sectionName]"> <xsl:for-each select="*"> <xsl:value-of select="."/> </xsl:for-each> </xsl:template>
One lesson from this is that if you are stymied by an XSLT problem, try writing the problem out as a procedural problem first, then apply the tags. This doesn't always work--the recursive nature of XML is often better handled by XSLT than it is by a language like VB--but it can prove a good starting point. Additionally, elements in an XML document do not always have explicit IDs associated with them, but you can use XPath expressions to pull out elements that have some unique characteristic to them (such as the title element, or you can generate IDs through such methods as the generate-id() function in XSLT).
Pages: 1, 2