XML.com

Unique particle attribution in the XML Schema Language

June 13, 2018

Mukul Gandhi

This article discusses the XML Schema "Unique Particle Attribution" constraint.

1. Introduction

This article discusses the XML Schema language's "Unique Particle Attribution" constraint, that XML Schema users encounter many times while working with XSD documents. This particular XML Schema constraint is also many times referred to as UPA constraint, usually by the XML Schema validation processors and XML Schema literature available at many places. The UPA constraint is defined both in XML Schema 1.0 and 1.1 languages, and its core essence hasn't changed in the two versions of XSD language. The UPA constraint is mandatory as well, in the XML Schema language. Therefore an XSD schema cannot work that has UPA ambiguity. In this article, I'll attempt to describe any differences as well about UPA constraint in XSD 1.0 and XSD 1.1 languages. These differences are not too many, but still it's useful to know them in case you're planning to use XSD 1.1 as your preferred XML Schema language.

I believe it's important to know and to understand about the XML Schema UPA constraint, by the XML Schema document authors, because in the presence of UPA violations the XSD document will fail to compile when fed to the XSD processor.

All the examples in this article, are tested with Xerces-J's XSD processor. The other compliant products will exhibit similar behavior.

2. Basic examples of UPA violation

The subsections 1) and 2) below, discuss few basic examples of UPA violation in XML Schema 1.0 and 1.1.

1) UPA violations in XSD 1.0 schema

The following XSD schema has a UPA violation present,

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">     <xs:element name="X">         <xs:complexType>             <xs:sequence>                 <xs:element name="a" type="xs:string" minOccurs="0"/>                 <xs:any processContents="skip" minOccurs="0"/>             </xs:sequence>         </xs:complexType>     </xs:element> </xs:schema>

If we look at the sibling declarations of xs:element (with name="a") and xs:any, then that is an example of UPA violation in XSD 1.0 language (it's not a UPA violation in XSD 1.1, as we'll discuss later).

Imagine, that you try to validate following XML instance document with the above XSD document,

<?xml version="1.0" encoding="UTF-8"?>
<X>
    <a>hello</a>
</X>

The validation fails with an XSD 1.0 processor. The Xerces-J's XSD 1.0 processor emits following error when this validation is attempted,

[Error] x.xsd:5:21: cos-nonambig: a and WC[##any] (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.

The other compliant XSD 1.0 processors, will emit similar error message.

It is clearly evident, when looking at XSD document of this example, that sibling declarations of xs:element (with name="a") and xs:any are ambiguous. That is, the XSD 1.0 processor cannot decide whether it should select the declaration xs:element (with name="a") or the declaration xs:any, when validating an XML instance element "a".

It is interesting to know, that the UPA violation will be reported when we attempt to compile the XSD schema (for example, by the JAXP API), and without doing any validation against an XML instance document.

If we have following, in the XSD document (either 1.0 or 1.1); please note the slight change as compared to above XSD document,

<xs:sequence>
    <xs:any processContents="skip" minOccurs="0"/>
    <xs:any processContents="skip" minOccurs="0"/>
</xs:sequence>

Then the two xs:any (wildcard) particles (we can call these xs:any declarations for now, instead of particles), present the UPA ambiguity. The XSD processor cannot decide, which xs:any declaration it should select for doing validation of an XML instance element.

For this modified XSD document, Xerces-J's XSD processor (both 1.0 and 1.1 processors) emits following error message,

[Error] x.xsd:5:21: cos-nonambig: WC[##any] and WC[##any] (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.

2) UPA violations in XSD 1.1 schema

Most of UPA violations which were present in XSD 1.0 are also present in XSD 1.1. The following is one of an important exception:

When a schema wildcard and an element particle compete for doing validation of an XML instance element, the XSD 1.0 processor will emit a UPA violation, but the XSD 1.1 processor will consider this ok and would prefer an element particle over the wildcard particle.

3. XML Schema Particles

It is certainly essential to know that, what is exactly meant by "particle" in XML Schema language. The UPA constraint is after all, all about XSD particles.

According to XML Schema specification,

"When an element is validated against a complex type, its sequence of child elements is checked against the content model of the complex type and the children are attributed to Particles of the content model."

A XML Schema particle is an XML Schema component, with following properties:

{min occurs}
An xs:nonNegativeInteger value. Required.

{max occurs}
Either a positive integer or unbounded. Required.

{term}
A Term component. Required.

"annotations" is also one of the properties of Particles, but I have not mentioned that above, since it is a documentation feature. For ease of remembering, any element in XSD that has "minOccurs" and "maxOccurs" attributes corresponds to an XSD particle (please note, such an element is not a particle, but it corresponds to a particle). The following XSD elements have "minOccurs" and "maxOccurs" attributes: a) local <xs:element> declarations, b) model groups <xs:all>, <xs:sequence>, and <xs:choice>, c) group references <xs:group> and d) wildcard <xs:any> (therefore, all of these corresponds to XSD particles).    

The "XML Schema component" means, that various XSD constructs in an input XSD document are converted into in-memory components during compilation of XSD document by the XML Schema processor. Loosely speaking, a software component is an abstraction with attributes (in the sense of an OO object) and behavior.

The "term" of a XSD particle, is of following three kinds: element declaration, wildcard or model group. This implies that, XSD particles correspond to the following three kinds: element declaration, wildcard or model group. The model group is of following three kinds: all, choice and sequence.

It is also evident from above definitions, that XSD particles are manifestations present in the XML schema and not in the XML instance documents that are being validated.

4. Detailed look at "Unique Particle Attribution" constraint

I'll illustrate using few examples, the different kinds of UPA constraints that may be present in XML Schema documents.

Example 1:

Consider the following XSD fragment (borrowed from W3C XML Schema wiki),

<xs:complexType name="bad1">
    <xs:sequence>
        <xs:choice>
            <xs:element name="A" type="xs:string" minOccurs="0" />
            <xs:element name="B" type="xs:string" minOccurs="0" />
        </xs:choice>
        <xs:choice>
            <xs:element name="A" type="xs:string" minOccurs="0" />
            <xs:element name="C" type="xs:string" minOccurs="0" />
        </xs:choice>
    </xs:sequence>
</xs:complexType>

This fragment clearly has a UPA violation. To match an element "A" in the XML instance document, the XSD processor has an option to use the element declaration of "A" from both the <xs:choice> constructs. Since there isn't unique validation path for the XSD processor for this example, an UPA violation error will be produced by the XML Schema validator (both 1.0 and 1.1).

Example 2:

Consider the following XSD fragment, in a XSD document,

<xs:complexType name="bad">
    <xs:sequence>
        <xs:group ref="grp1" minOccurs="0"/>
        <xs:group ref="grp1"/>
    </xs:sequence>
</xs:complexType>

<xs:group name="grp1">
    <xs:sequence>
        <xs:element name="a" type="xs:string"/>
        <xs:element name="b" type="xs:string"/>
    </xs:sequence>
</xs:group>

This fragment clearly has a UPA violation. To validate a sequence of elements {<a>, <b>} with complex type "bad", the XSD processor (both 1.0 and 1.1) cannot decide whether it should select 1st <xs:group ref="grp1" ... or the 2nd one.

Example 3:

Consider the following XSD document,

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X" type="xs:string"/>

    <xs:element name="Y" type="xs:string" substitutionGroup="X"/>

    <xs:complexType name="bad">
        <xs:sequence>
            <xs:element ref="X" minOccurs="0"/>
            <xs:element ref="Y" minOccurs="0"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

The above XSD document, also presents a UPA ambiguity (both in 1.0 and 1.1 versions of the XML Schema language). In above XSD document, element Y can substitute for element X (element X is the head of the substitution group). Loosely speaking, in this example elements X and Y are logically equivalent (since one can substitute for the other); therefore from the perspective of UPA constraint, elements X and Y are same.

5. Summary

This article discussed, the details of "Unique Particle Attribution (UPA)" constraint in the XML Schema language. We must be careful while writing the XSD documents for validation and the XSD documents must not have UPA violations present (otherwise, the XSD processor cannot simply compile the XSD document). At least, this article would have made it clear for the novice XSD users what the XSD UPA violation means and how to deal with it.

6. References