Overriding Concerns

November 26, 2003

Q: How do I merge two XML source trees into one?

I've tried so many things that it's driving me crazy: I want to merge or join two XML files. Which two files are to be merged are specified in a third file (call it merge.xml). This third file looks like this:

<?xml version="1.0"?>

<merge>

  <appxml>testapp.xml</appxml>

  <userxml>user.xml</userxml>

</merge>

And here are the two files to which merge.xml refers. First, testapp.xml:

<app name="testapp" lifetime="900">

  <mainmenu>

    <menu id="1" caption="test"/>

    <menu id="2" caption="another test"/>

  </mainmenu>

  <forms>

    <testform autosize="1"/>

    <testform2 autosize="0"/>

  </forms>

</app>

And here is user.xml:

<app lifetime="100">

  <mainmenu>

    <menu id="2" caption="my test"/>

    <menu id="3" caption="my menu"/>

  </mainmenu>

  <forms>

    <testform2 autosize="1"/>

  </forms>

</app>

The result must be:

<app name="testapp" lifetime="100">

  <mainmenu>

    <menu id="1" caption="test"/>

    <menu id="2" caption="my test"/>

    <menu id="3" caption="my menu"/>

  </mainmenu>

  <forms>

    <testform autosize="1"/>

    <testform2 autosize="1"/>

  </forms>

</app>

I'm using merge.xml as the source tree for the transformation. Basically I use the string-values of the appxml and userxml elements as input to the document() function, like this:

<xsl:template match="merge" >

  <xsl:variable name="app_xml" select="string(appxml)"

/>

  <xsl:variable name="user_xml" select="string(userxml)"

/>

  <xsl:call-template name="domerge">

    <xsl:with-param name="app_nodes"

select="document($app_xml)" />

    <xsl:with-param name="user_nodes"

select="document($user_xml)" />

  </xsl:call-template>

</xsl:template>

As you can see, I want to process the nodes with a named template called domerge. But what the heck should domerge contain?

A: Although you didn't say as much in your question, obviously the nature of the basic problem is how to use the user.xml file's contents to override in the result tree those of the testapp.xml file. That is, testapp.xml establishes rules for how some application is to behave, and user.xml permits some or all of those rules to be overridden.

This is one of my favorite uses for simple XML. You've got an additional twist -- the third file, which specifies which files to use -- but the basic approach is the same.

To start with, here's a basic domerge named template:

<xsl:template name="domerge">

  <xsl:param name="app_nodes" />

  <xsl:param name="user_nodes" />

  <app>

    <mainmenu>

      <xsl:copy-of select="$app_nodes//menu" />

      <xsl:copy-of select="$user_nodes//menu" />

    </mainmenu>

    <forms> 

      <xsl:copy-of select="$app_nodes//forms/*" />

      <xsl:copy-of select="$user_nodes//forms/*" />

    </forms>

  </app>

</xsl:template>

This doesn't do everything you need, but it gets you part of the way there. It begins by declaring the two parameters app_nodes and user_nodes, whose values you're supplying in the xsl:call-template element you've already constructed. Then it establishes the basic structure of the result tree -- an app root element, with mainmenu and forms child elements. Within each of those two children, it instantiates copies of the corresponding portions of testapp.xml and user.xml. The result tree from the stylesheet looks like this, so far:

<app>

  <mainmenu>

    <menu id="1" caption="test"/>

    <menu id="2" caption="another

test"/>

    <menu id="2" caption="my test"/>

    <menu id="3" caption="my menu"/>

  </mainmenu>

  <forms>

    <testform autosize="1"/>

    <testform2 autosize="0"/>

    <testform2 autosize="1"/>

  </forms>

</app>

The problems with this result tree are two-fold. First, it doesn't yet include any attributes for the app element. Second, the elements from testapp.xml which are overridden by user.xml -- these overridden elements are boldfaced above -- shouldn't be appearing at all.

Let's start with those attributes for the app element, name and lifetime. (Your sample code doesn't indicate that name can be overridden in user.xml, but I assume it can.) What you need to do is build each attribute using the ones in testapp.xml unless the same attribute appears in user.xml. Here's one approach, with the new code highlighted in boldface:

<app>

  <xsl:attribute name="name">

    <xsl:choose>

      <xsl:when

test="$user_nodes/app/@name"><xsl:value-of

select="$user_nodes/app/@name"/></xsl:when>

      <xsl:otherwise><xsl:value-of

select="$app_nodes/app/@name"/></xsl:otherwise>

    </xsl:choose>

  </xsl:attribute>

  <xsl:attribute name="lifetime">

    <xsl:choose>

      <xsl:when

test="$user_nodes/app/@lifetime"><xsl:value-of

select="$user_nodes/app/@lifetime"/></xsl:when>

      <xsl:otherwise><xsl:value-of

select="$app_nodes/app/@lifetime"/></xsl:otherwise>

    </xsl:choose>

  </xsl:attribute>

  [etc. as above]

</app>

What I've added here is a pair of xsl:attribute elements, which instantiate in the result tree the name and lifetime attributes and, then, assign their values (using an xsl:choose block for each attribute) depending on whether or not those attributes have been assigned values in user.xml. As desired, the start tag of the result tree's app element now looks like this:

<app name="testapp" lifetime="100">

Fixing the other problem with the named template so far -- that the elements from testapp.xml which were overridden by user.xml are still showing up in the result tree -- will be a little trickier. The problem is that the plain-old xsl:copy-of elements are too indiscriminate. For processing the menu elements, you can do something like this:

<app>

  [etc. as above]

  <mainmenu>

    <xsl:for-each select="$app_nodes//menu">

      <xsl:choose>

        <xsl:when

test="$user_nodes//menu/@id[.=current()/@id]"/>

        <xsl:otherwise><xsl:copy-of select="."

/></xsl:otherwise>

      </xsl:choose>

    </xsl:for-each>

    <xsl:copy-of select="$user_nodes//menu" />

  </mainmenu>

  [etc. as above]

</app>

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

Here the xsl:copy-of for all the testapp.xml menu elements has been replaced by an xsl:for-each which examines each of those menu elements in turn. If the current menu element's id attribute is matched by one in the user.xml tree, the template does nothing; otherwise, it instantiates in the result (via a simplified xsl:copy-of) a copy of the current menu element (again, from the testapp.xml file). Note also that the xsl:copy for the menu elements in user.xml hasn't been changed at all; all of those menu elements go straight into the result.

Handling the form elements -- whether overridden by user.xml or not -- is similar to the solution for the menu elements. (There aren't really any form elements as such in either testapp.xml or user.xml; instead, there are testform, testform1, etc. elements. I'm referring to them as form elements just as a sort of collective shorthand.) But menu elements were "matched" (or not) between the two input files by way of their id attributes' values; the key to matching form elements is by their element names. For example, testapp.xml has testform and testform2 elements; user.xml, only a testform2. So the structure for processing these elements is similar to that for processing the menu elements, but with a different test attribute in the xsl:when element:

<app>

  [etc. as above]

  <forms> 

    <xsl:for-each select="$app_nodes//forms/*">

      <xsl:choose>

        <xsl:when

test="$user_nodes//forms/*[name()=name(current())]"/>

        <xsl:otherwise><xsl:copy-of select="."

/></xsl:otherwise>

      </xsl:choose>

    </xsl:for-each>

    <xsl:copy-of select="$user_nodes//forms/*" />

  </forms>

</app>

As before, the assumption is that all form elements from user.xml get transcribed straight to the result tree; it's only the testapp.xml forms which need to be tested for inclusion.

One additional note about the xsl:when test attributes for the menu and form elements: they both use the current() function to refer to the node in testapp.xml currently being processed by their respective xsl:for-each loops. Inside an XPath expression predicate, this is often necessary; the context node at such a point is often different from the current node. For example, the context node in either of these two predicates is a matching element from the user.xml file, not the respective element from testapp.xml -- and it's the latter which need to be tested before copying to the result tree.

For reference, here's the final domerge named template:

<xsl:template name="domerge">

  <xsl:param name="app_nodes" />

  <xsl:param name="user_nodes" />

  <app>

    <xsl:attribute name="name">

      <xsl:choose>

        <xsl:when

test="$user_nodes/app/@name"><xsl:value-of

select="$user_nodes/app/@name"/></xsl:when>

        <xsl:otherwise><xsl:value-of

select="$app_nodes/app/@name"/></xsl:otherwise>

      </xsl:choose>

    </xsl:attribute>

    <xsl:attribute name="lifetime">

      <xsl:choose>

        <xsl:when

test="$user_nodes/app/@lifetime"><xsl:value-of

select="$user_nodes/app/@lifetime"/></xsl:when>

        <xsl:otherwise><xsl:value-of

select="$app_nodes/app/@lifetime"/></xsl:otherwise>

      </xsl:choose>

    </xsl:attribute>

    <mainmenu>

      <xsl:for-each select="$app_nodes//menu">

        <xsl:choose>

          <xsl:when

test="$user_nodes//menu/@id[.=current()/@id]"/>

          <xsl:otherwise><xsl:copy-of select="."

/></xsl:otherwise>

        </xsl:choose>

      </xsl:for-each>

      <xsl:copy-of select="$user_nodes//menu" />

    </mainmenu>

    <forms>

      <xsl:for-each select="$app_nodes//forms/*">

        <xsl:choose>

          <xsl:when

test="$user_nodes//forms/*[name()=name(current())]"/>

          <xsl:otherwise><xsl:copy-of select="."

/></xsl:otherwise>

        </xsl:choose>

      </xsl:for-each>

      <xsl:copy-of select="$user_nodes//forms/*" />

    </forms>

  </app>

</xsl:template>

The result tree matches your desired output.