String analysis with XSLT's analyze-string
July 22, 2024
XSLT 3.0 xsl:analyze-string
instruction and XPath 3.1 analyze-string
function
1) Introduction
XSLT (XSL Transformations) is a template-oriented markup language to do transformations of XML (Extensible Markup Language) and text documents, to result formats like XML, text, HTML and XHTML. XSLT programs (also known as XSLT stylesheets), are written using an XML syntax together with XML namespaces. XSLT 3.0 specification defines an XML Schema for XSLT stylesheets, and also defines the syntax and semantics of various XSLT language instructions in descriptive form.
This article assumes that the reader is familiar with XML and XML namespaces technologies.
An XSLT stylesheet transformation essentially can add, modify, delete and filter information from input documents that an XSL stylesheet processes, which are the primary objective of XSLT stylesheets.
XSLT 1.0 which is the first version of XSLT, became a W3C recommendation on 16 Nov 1999. Two subsequent W3C recommendations of XSLT language are versions 2.0 (W3C recommendation 23 January 2007) and 3.0 (W3C recommendation 8 June 2017) of XSLT. All XSLT language versions are widely used in software applications. As with any other software technology, the latest version of XSLT (i.e, 3.0) has lots of new XSLT language features than versions 1.0 and 2.0 of XSLT. XSLT 3.0 is largely compatible with XSLT 2.0, in terms of the data model used by XSLT. The data model used by XSLT, is known as XPath (XML Path Language) data model (which for e.g, defines the various XPath nodes that are available, and the data types).
This article explains in detail about XSLT 3.0's
xsl:analyze-string
instruction and the XPath 3.1 analyze-string
function, which are
useful XSL language features for XML and text string information analysis.
2) Features common between the XSLT xsl:analyze-string
instruction and the XPath
analyze-string
function, and their similarities and differences with the XPath
tokenize
function
XSLT's xsl:analyze-string
instruction and the XPath
function analyze-string
, both have similar objectives. Both of these XSL
language features, essentially require an input string to be analyzed and split
into substrings, and a
regular expression. A regular expression (which is very often mentioned as
regex), is a string pattern that may match zero or many strings (for example, the regex [a-z]+
matches any word formed with any characters
comprising the lower-case
alphabet characters 'a' to 'z'). Both of these XSL language features, are
conceptually similar to a string tokenizer like the XPath "tokenize"
function,
but with certain important differences that are explained below in this section.
It's useful to remember the functionality of the XPath
"tokenize"
function when deciding whether to use one or both of the
xsl:analyze-string
instruction and the XPath analyze-string
function. The XPath
"tokenize"
function takes as input a 'string to be tokenized', and a 'regular
expression' that breaks an input string into various substrings (from an input string's left to right direction) around an input string's character indexes
identified by the regular expression. The XPath "tokenize"
function produces as
output a sequence of substrings, that are identified by string tokenizer's
regex. Very often, a string tokenizer is needed in software applications to
split an input string into a sequence of tokens that are words in an input
string.
Both the xsl:analyze-string
instruction and the XPath
analyze-string
function can do the same tasks as an XPath tokenize
function, but
that is a subset of features of the xsl:analyze-string
instruction and the XPath
analyze-string
function.
The XSLT xsl:analyze-string
instruction and the XPath
analyze-string
function have features to emit both matching and non-matching
substrings of an input string at the regex boundaries, from left to right
direction of an input string. An XPath tokenize
function can emit
only substrings of an input string, where these emitted substrings are parts of
an input string that are not substrings matched by the tokenize
function's regex
argument. i.e, an input string's parts that are matched by XPath tokenize
function's regex are not available in the tokenize
function's output.
We'll study XSLT's
xsl:analyze-string
instruction and XPath's analyze-string
function with examples in detail.
3) XSLT 3.0 xsl:analyze-string
instruction
The XSLT xsl:analyze-string
instruction has the following
syntax, which an XSLT stylesheet author needs to use when using the
xsl:analyze-string
instruction in XSL stylesheets:
<xsl:analyze-string select="..." regex="..." flags="...">
<xsl:matching-substring>
...
</xsl:matching-substring>
<xsl:non-matching-substring>
...
</xsl:non-matching-substring>
</xsl:analyze-string>
An XSLT xsl:analyze-string
instruction has the
following requirements:
1) An xsl:analyze-string
stylesheet element can have the
following attributes : 'select', 'regex' and 'flags'. The 'select' and 'regex'
attributes are mandatory on an xsl:analyze-string
element, whereas the 'flags'
attribute is optional.
2) An xsl:analyze-string
stylesheet element, must have one or both of the
elements xsl:matching-substring
and xsl:non-matching-substring
as child
elements. Both the elements xsl:matching-substring
and
xsl:non-matching-substring
can appear only once in an xsl:analyze-string
element. If both the elements xsl:matching-substring
and
xsl:non-matching-substring
are present in an xsl:analyze-string
element,
then the xsl:matching-substring
element must be written prior to
xsl:non-matching-substring
element.
It's useful to know that when an xsl:analyze-string
instruction contains only an xsl:non-matching-substring
as its child element, the
xsl:analyze-string
instruction functions very similarly to XPath's 'tokenize'
function.
Both the XSL stylesheet elements xsl:matching-substring
and xsl:non-matching-substring
can produce an arbitrary
stylesheet output structure (for example, XML or HTML data information), a
sequence of data values or even a single atomic value. Any of these XSL
stylesheet output contents may be constructed dynamically or statically by the
stylesheet.
An xsl:analyze-string
instruction's output can start with
either the matching (produced by the XSL instruction xsl:matching-substring
) or
non-matching (produced by the XSL instruction xsl:non-matching-substring
) string
content from an input string (the computed value of the xsl:analyze-string
element's
'select' attribute) that is processed by an xsl:analyze-string
instruction
instruction. An xsl:analyze-string
instruction's output shall start with the
matching information if the beginning of the input string matches the xsl:analyze-string
instruction's regex. An xsl:analyze-string
instruction's output shall start with
the non-matching information if the beginning of the input string does not match the
xsl:analyze-string
instruction's regex.
The xsl:analyze-string
instruction's output
alternates with matching and non-matching substring information (this is because of how a regex naturally tokenizes an input string. A matching
part of an input string will always be followed by a non-matching part, and
vice-versa). As mentioned in the previous paragraph, an xsl:analyze-string
instruction's output can either start with the matching substring information or the
non-matching substring information.
Let's study the xsl:analyze-string
instruction's behavior further with a few XSLT stylesheet examples, shown below in
this section.
XML document [XML1]:
<?xml version="1.0" encoding="UTF-8"?>
<info>XSLT xsl:analyze-string instruction</info>
XSL stylesheet document [XSL1]:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform”
version="3.0">
<xsl:output method="xml" indent="yes"/>
<!-- The regex used by the xsl:analyze-string instruction below matches
a contiguous sequence of one or more whitespace characters -->
<xsl:template match="/">
<stringRegexAnalysis>
<xsl:analyze-string select="info" regex="\s+">
<xsl:matching-substring>
<matchPart>
<xsl:value-of select="."/>
</matchPart>
</xsl:matching-substring>
<xsl:non-matching-substring>
<nonMatchPart>
<xsl:value-of select="."/>
</nonMatchPart>
</xsl:non-matching-substring>
</xsl:analyze-string>
</stringRegexAnalysis>
</xsl:template>
</xsl:stylesheet>
When the XSL stylesheet [XSL1] transforms an XML input document [XML1], the following stylesheet output is produced:
<?xml version="1.0" encoding="UTF-8"?>
<stringRegexAnalysis>
<nonMatchPart>XSLT</nonMatchPart>
<matchPart> </matchPart>
<nonMatchPart>xsl:analyze-string</nonMatchPart>
<matchPart> </matchPart>
<nonMatchPart>instruction</nonMatchPart>
</stringRegexAnalysis>
The XSL stylesheet's output shown above should appear self-explanatory. With this XSL stylesheet
transformation example, an xsl:analyze-string
instruction has produced an
alternating sequence of substring matching and non-matching information. For
this example, an xsl:analyze-string
instruction's output starts with the
non-matching substring information, because that is the first part of the input string.
It's useful to remember that a regex always corresponds
to zero or more matching substrings of an input string. For an xsl:analyze-string
instruction, by changing the regex value, we can produce the same substrings of an input
string as matching substrings as were produced as non-matching substrings with a
different regex. This depends on how an XSL stylesheet author chooses the regex
value to be used with the xsl:analyze-string
instruction.
Let's assume that we change the stylesheet XSL1's
regex to [\w|:|\-]+
(which specifies a contiguous sequence of word characters,
i.e. [a-zA-Z_0-9]
, and additionally includes the characters ':'
and '-'
), then
that produces the following XSL transformation output for the XML input document
XML1:
<?xml version="1.0" encoding="UTF-8"?>
<stringRegexAnalysis>
<matchPart>XSLT</matchPart>
<nonMatchPart> </nonMatchPart>
<matchPart>xsl:analyze-string</matchPart>
<nonMatchPart> </nonMatchPart>
<matchPart>instruction</matchPart>
</stringRegexAnalysis>
With the regex [\w|:|\-]+
, if a substring is found that
matches the regex \s+
, then with the new regex that same substring is a
non-matching substring. Similarly, for a non-matching substring found by the
previous regex, the new regex identifies the same substring as a matching substring.
An XSL stylesheet using an xsl:analyze-string
instruction doesn't necessarily have to output technical names for XML elements,
for example "matchPart", "nonMatchPart" etc. The following XSLT stylesheet
illustrates producing user friendly XML element names in the XSL stylesheet's
transformation output.
Let's say we have the following XSL stylesheet document ([XSL2]):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<wordsWithinAString>
<xsl:analyze-string select="info" regex="[\w|:|\-]+">
<xsl:matching-substring>
<word>
<xsl:value-of select="."/>
</word>
</xsl:matching-substring>
</xsl:analyze-string>
</wordsWithinAString>
</xsl:template>
</xsl:stylesheet>
The XSL stylesheet XSL2 shown above, when transforming an XML input document XML1, produces the following output:
<?xml version="1.0" encoding="UTF-8"?>
<wordsWithinAString>
<word>XSLT</word>
<word>xsl:analyze-string</word>
<word>on</word>
</wordsWithinAString>
The regex value used in the XSL stylesheet XSL2 shown above is the same as that used in one of the previous XSL
stylesheet examples, but the XSL stylesheet XSL2 produces a more
user-friendly XSL transformation output. It also produces only the non-null words
(because only the xsl:matching-substring
element is present as a child of the
xsl:analyze-string
element) that are found in the input string.
The regex [\w|:|\-]+
used in the previous XSL
stylesheet example used the regex characters ':'
and '-'
for technical
illustration. For similar requirements, an XSL stylesheet author more often
uses the regex value [\w]+
, which identifies substrings formed
with word characters. Using this simpler regex produces a few additional
matching substrings in an XSL transformation output. The XSLT stylesheet
language has various other features by which an XSL stylesheet author can post-process the
result of the xsl:analyze-string
instruction if needed.
When the regex value used with xsl:analyze-string
instruction
is \w+
as for the previous XSL transformation example illustrated in this
section, the XSL stylesheet's output is the following:
<?xml version="1.0" encoding="UTF-8"?>
<wordsWithinAString>
<word>XSLT</word>
<word>xsl</word>
<word>analyze</word>
<word>string</word>
<word>instruction</word>
</wordsWithinAString>
4) XPath 3.1 analyze-string function
The XPath analyze-string
function has the same purpose as the XSLT xsl:analyze-string
instruction. An obvious difference between these two XSL
language features is that an xsl:analyze-string
instruction is an XSLT instruction that may be used in an XSL stylesheet, whereas XPath has the library function named analyze-string
.
When authoring XSLT 3.0 stylesheets, the XPath 3.1 processing environment is available in an XSLT 3.0 processor. The availability
of the XSL analyze-string
feature in XSLT 3.0 as xsl:analyze-string
instruction and
the XPath 3.1 function analyze-string
doesn't mean that either of these is preferable over the other when authoring an XSL stylesheet. In an XSLT 3.0 stylesheet when there is a requirement to use the analyze-string
feature, the XSL
stylesheet author can use either the XSLT xsl:analyze-string
instruction or the XPath analyze-string
function.
As we'll see with the XPath function analyze-string
examples in this section, it's probably somewhat simpler to use the
xsl:analyze-string
instruction than the XPath analyze-string
function. This is my personal opinion as the author of this article, but different XSL stylesheet authors have different preferences whether to use the
xsl:analyze-string
instruction or the XPath analyze-strin
g function.
Let's say that we have an XSL stylesheet document [XSL3] as follows, that uses the XPath function analyze-string
:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="analyze-string(info,'\s+')"/>
</xsl:template>
</xsl:stylesheet>
When the stylesheet document XSL3 transforms the XML document XML1 specified earlier in this article, the XSL transformation produces the following output:
<?xml version="1.0" encoding="UTF-8"?>
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
<non-match>XSLT</non-match>
<match> </match>
<non-match>xsl:analyze-string</non-match>
<match> </match>
<non-match>instruction</non-match>
</analyze-string-result>
As we can see, this output contains similar string analysis information to the result output of one of the previously specified XSL
stylesheets that uses the xsl:analyze-string
instruction.
The XSLT 3.0 specification provides an XML Schema
definition for the result of the XPath analyze-string
's function call (the stylesheet
XSL3's output conforms to this XML Schema
document). Every XPath function call to the function analyze-string
produces an XML document output conforming to this specified XML Schema. For reference, the XML Schema document for
the XPath function analyze-string
's result is available at :
https://www.w3.org/TR/xpath-functions-31/#schema-for-analyze-string.
To summarize, the essential semantics of the XML document structure of the result of the XPath function call analyze-string
are the following:
1) The XPath function call analyze-string
's result has a topmost XML node
with the XDM (XPath Data Model) type element fn:analyze-string-result
, where the
namespace of the element analyze-string-result
is
http://www.w3.org/2005/xpath-functions (which is commonly bound to the XML
namespace prefix "fn").
2) In the function call analyze-string
's result, the XML element
analyze-string-result
's children form a strictly alternating sequence of the XML
elements for "match" and "non-match". Either of an XML element "match" or "non-match" can appear as the first sibling. This is due to the same reasons as for
the result of the XSLT xsl:analyze-string
instruction.
As with other XSLT stylesheets, the result of the
XPath function call analyze-string
can be transformed to something other than the
standard output of the analyze-string
's function call (for example, to make the
final result of XSLT stylesheet's output more user-friendly).
This is illustrated with the following XSLT stylesheet example ([XSL4]):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="fn"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<wordsWithinAString>
<xsl:apply-templates select="analyze-string(info,'\w+')/fn:match"/>
</wordsWithinAString>
</xsl:template>
<xsl:template match="fn:match">
<word>
<xsl:value-of select="."/>
</word>
</xsl:template>
</xsl:stylesheet>
When the XSL stylesheet XSL4 transforms the XML document XML1 specified earlier in this article, the stylesheet transformation produces the following result:
<?xml version="1.0" encoding="UTF-8"?>
<wordsWithinAString>
<word>XSLT</word>
<word>xsl</word>
<word>analyze</word>
<word>string</word>
<word>instruction</word>
</wordsWithinAString>
As we can see in the above XSL stylesheet example, the stylesheet XSL4 processes only the XML element
named "match" (and subsequently transforms that to an XML element named "word") from the result of the XPath function call analyze-string
.
The XPath analyze-string
function (or the XSLT instruction xsl:analyze-string
) can be used in an XSL stylesheet as it
would be normally used. The analyze-string
function call's result can be post-processed (for example, grouping and aggregating the analyze-string
function's output) by other XSLT language instructions. Let's study these
concepts, illustrating with an example below.
XML document [XML2]:
<?xml version="1.0" encoding="UTF-8"?>
<info>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis condimentum, orci in accumsan pulvinar,
orci diam condimentum dolor, at tincidunt ante lacus convallis turpis. Nunc metus risus, ultrices sit
amet pretium eu, rhoncus non nisl. Ut eu luctus magna. Sed quis lorem magna. Nunc malesuada velit volutpat,
lacinia odio ornare, mattis augue. Sed scelerisque urna et consectetur vulputate. Vivamus porttitor laoreet
nisl, lacinia blandit quam facilisis facilisis. Donec libero augue, facilisis eget blandit in, convallis
sed urna. Aliquam elementum dapibus malesuada. Fusce mattis ipsum eu viverra tincidunt. In hac habitasse
platea dictumst.
</info>
XSL stylesheet document [XSL5]:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="xs fn"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<groupsOfWords>
<xsl:for-each-group select="analyze-string(info,'[\s+|,|\.]+')/fn:non-match" group-by="string-length(.)">
<xsl:sort select="current-grouping-key()" data-type="number"/>
<wordsGroup strLength="{current-grouping-key()}" groupSize="{count(current-group())}">
<words>
<xsl:value-of select="string-join(for $nonMatchElem in current-group() return xs:string($nonMatchElem),',')"/>
</words>
</wordsGroup>
</xsl:for-each-group>
</groupsOfWords>
</xsl:template>
</xsl:stylesheet>
When the XML document XML2 is transformed by the XSL stylesheet XSL5 that's shown above, the following XSL transformation output is produced:
<?xml version="1.0" encoding="UTF-8"?>
<groupsOfWords>
<wordsGroup strLength="2" groupSize="9">
<words>in,at,eu,Ut,eu,et,in,eu,In</words>
</wordsGroup>
<wordsGroup strLength="3" groupSize="7">
<words>sit,sit,non,Sed,Sed,sed,hac</words>
</wordsGroup>
<wordsGroup strLength="4" groupSize="18">
<words>amet,elit,Duis,orci,orci,diam,ante,Nunc,amet,nisl,quis,Nunc,odio,urna,nisl,quam,eget,urna</words>
</wordsGroup>
<wordsGroup strLength="5" groupSize="16">
<words>Lorem,ipsum,dolor,dolor,lacus,metus,risus,magna,lorem,magna,velit,augue,Donec,augue,Fusce,ipsum</words>
</wordsGroup>
<wordsGroup strLength="6" groupSize="7">
<words>turpis,luctus,ornare,mattis,libero,mattis,platea</words>
</wordsGroup>
<wordsGroup strLength="7" groupSize="11">
<words>pretium,rhoncus,lacinia,Vivamus,laoreet,lacinia,blandit,blandit,Aliquam,dapibus,viverra</words>
</wordsGroup>
<wordsGroup strLength="8" groupSize="5">
<words>accumsan,pulvinar,ultrices,volutpat,dictumst</words>
</wordsGroup>
<wordsGroup strLength="9" groupSize="13">
<words>tincidunt,convallis,malesuada,vulputate,porttitor,facilisis,facilisis,facilisis,convallis,elementum,malesuada,tincidunt,habitasse</words>
</wordsGroup>
<wordsGroup strLength="10" groupSize="1">
<words>adipiscing</words>
</wordsGroup>
<wordsGroup strLength="11" groupSize="5">
<words>consectetur,condimentum,condimentum,scelerisque,consectetur</words>
</wordsGroup>
</groupsOfWords>
The XSL stylesheet XSL5 shown
above used the XPath analyze-string
function, whose result has been aggregated
and grouped using the xsl:for-each-group
instruction to provide a different aggregate data view of the
analyze-string
's result.
5) XPath 3.1 tokenize function
Although studying XPath's tokenize
function is not the
topic of this article, it is useful to discuss an XSL stylesheet
example that solves one of the use cases using the XPath tokenize
function that was
solved earlier in this article using the xsl:analyze-string
instruction
and/or the XPath function analyze-string
.
Let's study the following XSLT stylesheet [XSL6]:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn0="http://fn0"
exclude-result-prefixes="xs fn0"
version="3.0">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<groupsOfWords>
<xsl:for-each-group select="fn0:getNonEmptyTokens(tokenize(info,'[\s+|,|\.]+'))" group-by="string-length(.)">
<xsl:sort select="current-grouping-key()" data-type="number"/>
<wordsGroup strLength="{current-grouping-key()}" groupSize="{count(current-group())}">
<words>
<xsl:value-of select="string-join(for $nonMatchElem in current-group() return xs:string($nonMatchElem),',')"/>
</words>
</wordsGroup>
</xsl:for-each-group>
</groupsOfWords>
</xsl:template>
<!-- Get sequence of "token" elements, for token strings having length > 0. -->
<xsl:function name="fn0:getNonEmptyTokens" as="element()*">
<xsl:param name="tokens" as="xs:string*"/>
<xsl:for-each select="$tokens[string-length(.) &gt; 0]">
<token><xsl:value-of select="."/></token>
</xsl:for-each>
</xsl:function>
</xsl:stylesheet>
The XSL stylesheet XSL6 illustrated above, when transforming the XML input document XML2 described earlier in this section, produces an XSL transformation output which is the same as the XSL transformation output that the XSL stylesheet XSL5 produced.
In the XSL stylesheet XSL6 shown above, we've transformed the result of the XPath tokenize function to
a node sequence using the stylesheet function fn0:getNonEmptyTokens
,
and subsequently grouped the result of the function call fn0:getNonEmptyTokens
to produce
the XSL stylesheet XSL6's final output.
As we discussed earlier in this article, the XPath tokenize function's difference with the xsl:analyze-string
instruction and the
XPath function analyze-string
is that the XPath tokenize function cannot produce an input string's regex matching regions.
6) Conclusion
This article has discussed the XSLT 3.0 language xsl:analyze-string
instruction in detail, and an XPath 3.1 function analyze-string
that
produces an output with similar information as the xsl:analyze-string
function. Both the XSLT xsl:analyze-string
instruction and the XPath function analyze-string
are useful XSL language features for XML and text
string information analysis using programming regular expressions.
We have also discussed using an XPath 3.1 function 'tokenize'
to do string
information analysis using regular expressions, to solve use cases with similar objectives
as the XSLT and XPath analyze-string
language features.
This article hasn't explained details about the xsl:analyze-string
instruction and XPath analyze-string
function's regex
'flags'. Regex flags are optional to use with these XSL language features. Regex
flags are options that allow among various things like, regex match to work in
case-insensitive mode. XPath 3.1 regex flags are explained in detail at the link
:
https://www.w3.org/TR/xpath-functions-31/#flags. The Regex syntax used by
all the features in XSLT 3.0 and XPath 3.1 that require regex, is available at
the link :
https://www.w3.org/TR/xpath-functions-31/#regex-syntax.
Users familiar with using regex in languages like XML Schema, Perl and Java, shall find simpler to learn XSLT 3.0 and XPath 3.1 regular expressions. Following are links to few of these various other regex syntax definitions : https://www.w3.org/TR/xmlschema-2/#regexs, https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.
All the examples provided in this article have been successfully tested with the Saxon-HE XSLT 3.0 processor and Apache Xalan-J's XSLT 3.0 development build.
7) References
Following are the references to relevant W3C recommendations and XSLT processors.
- XSL Transformations (XSLT) Version 3.0
- XML Path (XPath) Language 3.1
- XPath and XQuery F&O 3.1
- Saxon: XSLT, XQuery, and XML Schema processors
- Apache XalanJ: XSLT processor