
Understanding W3C Schema Complex Types
by Donald SmithAugust 22, 2001
My response to that assertion is to ask why would you want to write complex types without understanding them, especially when they are easily understandable? There are four things you need to know in order to understand complex types in W3C Schemas. These four things are easy to understand. See what you think about Kawaguchi's argument after learning them.
One of the most important, but least emphasized, aspects of W3C schemas is the type hierarchy. The importance of the type hierarchy can hardly be overstated. Why? Because the syntax for expressing types in schemas follows precisely from the type hierarchy.
XML Schema Part 2: Datatypes, Section 3 contains a helpful graphic that explains the schema type hierarchy.
Schema types form a hierarchy because they all derive, directly or
indirectly, from the root type. The root type is
anyType. (You can actually use anyType in an
element declaration; it allows any content whatsoever.) The type
hierarchy first branches into two groups: simple types and complex
types. Here we encounter the first two of the four things you need to
know in order to understand complex types: first, derivation is
the basis of connection between types in the type hierarchy; and,
second, the initial branching of the hierarchy is into simple and
complex types.
Kinds of Derivation
To derive a type means to take an existing type (called the "base") and modify it in some way so as to produce a new type. There are four kinds of derivation: restriction, extension, list, and union. This discussion looks at derivation by restriction and extension since they are the most commonly used.
Derivation by restriction takes an existing type as the base and creates a new type by limiting its allowed content to a subset of that allowed by the base type. Derivation by extension takes an existing type as the base and creates a new type by adding to its allowed content.
Simple Types versus Complex Types
Simple types and complex types differ in this way: simple types cannot have element children or attributes; complex types may have element children and attributes.
Tracing the type hierarchy down the branch of simple types, we see
that the first simple type is anySimpleType, which is
also type that you could actually use. W3C XML Schemas has 44 built-in
simple types, each of which derives from anySimpleType,
and all but three of which derive by restriction. Not a single one
derives by extension. To extend a simple type would mean to add
element children or an attribute. This contradicts the definition of
simple type in W3C Schemas and is thus prohibited.
Deriving a New Simple Type
Thinking in terms of the type hierarchy, it ought to be relatively
straightforward to derive a new simple, "myNameType", that restricts
its base type, "string", to a specific, fixed subset, "Don Smith". The
W3C XML Schema fragment for expressing myNameType is
<simpleType name="myNameType">
<restriction base="string">
<enumeration value="Don Smith" />
</restriction>
</simpleType>
As you can see, the XML Schema for this type definition follows the type hierarchy exactly -- except for the enumeration element, one of the twelve facets that can be used to qualify types. We won't look at facets here since our concern is with the relation between the type hierarchy and the Schema syntax for expressing types. I simply used this one to complete the example. (See Table B1.a.Simple Types and Applicable Facets in Schema Part 0: Primer for a convenient list of facets.)
Now I just have to associate myNameType with an
element, and then I can use the type in an XML document. So the
element declaration
<element name="employee" type="dc:myNameType" />
lets me use <dc:employee>Don
Smith</dc:employee> in an XML document instance.
What about Complex Types?
Does the syntax for complex types also follow the logic of the type hierarchy? Yes. But the type hierarchy diagram doesn't help us at this point because it doesn't provide two crucial pieces of information about complex types. However, once you understand these two points, complex types lose their indecipherable complexity and become quite intelligible.
The Two Forms of Complex Types
Complex types are divided into two groups: those with simple content and those with complex content. And that leads us to the third thing you need to know in order to understand complex types: while both forms of complex type allow attributes, only those with complex content allow child elements; those with simple content only allow character content.
In other words, the difference between complex types with simple content and complex types with complex content is that the former do not allow element children while the latter do. That's it. The two forms of complex type are represented in what I call the Schema Type Decision Tree (PDF) under the complex type branch.
Let's suppose I want to add an attribute to
myNameType. Adding an attribute to a simple type always
moves it into the complex type branch of the type hierarchy. Once on
the complex type branch, I must ask a second question. Do I want the
new type to allow element content? If I don't, then my new type must
be a complex type with simple content. After that, I simply take
dc:myNameType as the base type and extend it by adding an
attribute:
<complexType name="myNewNameType">
<simpleContent>
<extension base="dc:myNameType">
<attribute name="position" type="string" />
</extension>
</simpleContent>
</complexType>
Now, after declaring my element "employee" to be of
myNewNameType, I can have <dc:employee
position="trainer"> Don Smith</dc:employee> in my XML
document instance.
It may seem odd that adding an attribute to a simple type requires the creation of a new complex type, one that has simple content to boot. But that's the logic of the type hierarchy: a type that has attributes must be a complex type, and that type can either allow element children or not. Perhaps an odd logic, but it is intelligible.
Let's suppose now that I want my complex type to have child elements. That requires a complex type with complex content. So I simply add my content model and attributes (if any). That's easy. But maybe too easy. We must be careful not skip over a crucial fact that makes a big difference.
Adding a content model is still a derivation of a new type from
some base type. If I do not take an existing complex type as the base
for the new derivation, what will I use for a base type? I'll use
anyType. The vast majority of types that allow element content are
restrictions of anyType. For example,
<complexType name="myNewNameType">
<complexContent>
<restriction base="anyType">
<sequence>
<element name="name" type="string" />
<element name="location" type="string" />
</sequence>
<attribute name="position" type="string" />
</restriction>
</complexContent>
</complexType>
<element name="employee" type="dc:myNewNameType" />
The type associated with "employee" now has an element named "name" followed by an element named "location". Further, personnel can have an attribute named "position":
<dc:employee position="trainer"> <dc:name>Don Smith</dc:name> <dc:location>Dallas, TX</dc:location> </dc:employee>
The logic behind the syntax is straightforward. I want a type that
allows child elements. That requires a complex type with complex
content, while still deriving a new type from a base type. In this
case I'm restricting anyType; I could as easily extend another type. I
add my content model and an attribute declaration. I'm done, and it
was all pretty easy.
Ambushed by Abbreviation
|
|
| Post your comments |
But can't this be expressed more concisely? Yes, it can. There is
an abbreviated form for all complex type definitions that have complex
content and restrict anyType. You simply leave out the
<complexContent> and <restriction base="anyType"> elements:
<complexType name="myNewNameType">
<sequence>
<element name="name" type="string" />
<element name="location" type="string" />
</sequence>
<attribute name="position" type="string" />
</complexType>
This type definition is equivalent to the previous one. And that
leads us to the fourth thing you need to know in order to understand
complex types: the default syntax for complex types is complex
content that restricts anyType.
Why didn't I show you the abbreviated syntax first? Because the abbreviation obscures the logic behind the default syntax. If all you see is <complexType> followed by a content model, it's totally confusing as to why complex types sometimes have <complexContent> or <simpleContent> child elements or, often, neither.
Now that you know the logic behind the two forms of complex type, you won't be confused when you see a complex type that has neither <complexContent> nor <simpleContent>. You know what the default is.
Those Tricky Empty Elements
Writing type definitions for empty elements turns out to be counter-intuitive, but, fortunately, the logic behind the complex type syntax still holds. Remember that an empty element is one that has neither data content nor child elements. It may have an attribute. Let's take the case of an empty element that doesn't have an attribute.
Your first inclination might be to associate the empty element with a simple type. But that won't work since simple types allow data content. So it must be a complex type. The, ask yourself the next question. Will it allow element children? No. We need a <complexType> with <simpleContent>, right?
Wrong. Complex types with simple content also allow data content, and we want an empty element. That leaves us with <complexType> with <complexContent>, which ensures that there will not be any data content in the element. But we don't want child elements, either, and a complex type with complex content allows child elements. The key is that it doesn't require them. What do we do? Simply leave the content model out of the type definition:
<complexType name="processingHook">
<complexContent>
<restriction base="anyType">
</restriction>
</complexContent>
</complexType>
<element name="callMyApp" type="dc:processingHook" />
Our type definition, now associated with the element "callMyApp", allows the markup <callMyApp/> to occur in my XML document instance.
Now apply the default syntax for complex types to this type definition. An definition equivalent to the one above is
<complexType name="processingHook"> </complexType>
It's no wonder that people get confused about complex types. They generally don't realize that all complex types are divisible into two kinds: those with simple content and those with complex content. The reason why people don't generally realize this is because they normally learn the abbreviated syntax first. But, as we've seen, if you learn the full syntax and the logic behind it first, then the abbreviated syntax, and complex types in general, cease to be a befuddingly conundrum.
If all of this is now as clear to you as it is to me, you don't have to trust anyone's assurances that you should use complex types without understanding them. You can now use and understand them.
Convinced that W3C XML Schemas aren't so hairy after all, or do you still have questions? Ask them here in our forum.
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- Oh my god! What a helpful article
2007-11-25 06:20:21 dogdog172 [Reply]
This is by far the most useful article I have been reading in understanding simpleType, complexType, simpleContent and complexContent though this is an article on 2001. I do understand what are they and how to use them now. Really thumbs up for the author.
By the way, It is not that difficult to state the difference between complexType with simpleContent and complexType with complexContent explicitly. How come it is so hard to find it out from other web tutorial or XML books?
- From DTD to W3C Schema... combining datatyping and attribute values
2004-08-03 04:59:33 Ingrid [Reply]
Am I right in thinking that datatyping at element level ie <xs:element name="num" type="xs:integer">
and specifying a choice of attribute values ie
<xs:attribute name="kind">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="volume_number"/>
<xs:enumeration value="page_range"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
......does not go together??????
**************************
Setting out to convert the following DTD element specification...
<!ELEMENT num (#PCDATA | emph)*>
<!ATTLIST num
kind (volume_number | page_range) #REQUIRED>
... into Schema....
<xs:element name="num">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="emph"/>
</xs:sequence>
<xs:attribute name="kind" use="required">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="volume_number"/>
<xs:enumeration value="page_range"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
.... I realised, that if I want to make use of the additional datatyping capabilites of Schema (ie adding a datatype of xs:integer to the element "num" to allow numerical element content only, for example), I cannot at the same time specify a choice of required attribute values, i.e. volume_number and page_range for this element. As soon as I give the element a datatype
ie <xs:element name="num" type="xs:integer"> no further child element or attribute specifications are allowed...
I know I can set the integer at the attribute level, which means that the data needs to be entered within the <num ...."31"> tag rather than between an opening and closing
<num kind="volume_number">31</num>
but the element is part of a digitisation project template for transcribers and the idea is, to set up the template in such a way that it can check data entry at element level, ie keyers only need to enter data between element tags, not at attribute level....
I guess I am stuck with adding child elements to the <num> element ie <volume_number> <page_range> and add some datatyping to these...???
- fantastic
2004-06-20 10:11:23 udee [Reply]
I was totally confused by complex types and had no idea whatsoever to use simple and complex contect.Donald Smith's article corrected all that. This article deserves a perfect 10.Shall be recommending it many other ppl who i am sure are as as confused as i was.
thank you Mr.Smith
- arrays
2003-10-14 08:15:40 natali ku [Reply]
in generated WSDL file i get next:
<complexType name="ArrayOf_tns1_ICompanyVO">
<complexContent>
<restriction base="soapenc:Array">
<attribute ref="soapenc:arrayType" wsdl:arrayType="null[]"/>
</restriction>
</complexContent>
</complexType>
<element name="ArrayOf_tns1_ICompanyVO" nillable="true" type="intf:ArrayOf_tns1_ICompanyVO"/>
i think wsdl:arrayType="null[]"/> is wrong, it supposed to be wsdl:arrayType="ICompanyVO[]"/>. Can uive me a hint what could be wrong? Thank.natali
- Good! Complex Types have use!
2002-01-31 03:25:47 Tim Salmon [Reply]
Thankyou for the article. Hopefully now when someone searches for "XML Schema and Inheritance" in a search engine it will pull up this article.
- XSD enumarations
2001-11-14 05:27:13 Harry Hirsch [Reply]
Hi folks,
i want to give a list of enumerations to an attribute, but also keep the opportunity to let other values than in xsd:enumeration be valid ??
- XSD enumarations
2005-07-19 01:59:30 JAPISoft [Reply]
Just use the union like that :
<simpleType>
<union>
<restriction base="xs:string">
<enumeration.../>
</restriction>
<simpleType>
...
</simpleType>
</union>
</simpleType>
Best regards,
A.Brillant
EditiX - XML Editor and XSLT Debugger
http://www.editix.com
- XSD enumarations
- Not that simple
2001-09-07 09:24:22 Vitaliy Zavesov Zavesov [Reply]
I agree that schemas are understandable, as any programming language is, but they are by no means simple. Consider the following example,
<complexType name="SomeName">
<complexContent>
<extension base="SomeBase">
<sequence>
elements
</sequence>
attributes
</extension>
</complexContent>
</complexType>
1) Why do we need to specify that SomeName is a complex type? It was my experience that one barely ever uses the simple type. Why does W3C not make the complex type a default?
2) The same applies to complex content. W3C can make it a default.
3) Wy not make the sequence model group a default since it is used most often?
If all of these suggestions were implemented, the above example would be changed to
<type name="SomeName">
<extension base="SomeBase">
elements
attributes
</extension>
</type>
This would make more sense and save 6 (!) lines of code. Or whould it be too difficult to make software that would validate it correctly?
- Not that simple
2003-07-14 10:53:13 Eric Hsu [Reply]
I to am wondering about having that <sequence> element all over the place. Even for complex types that contain only one other element.
Is there a way around this?
It wasn't necesary in schema 2000/10 i think
- Not that simple
2003-07-14 10:53:12 Eric Hsu [Reply]
I to am wondering about having that <sequence> element all over the place. Even for complex types that contain only one other element.
Is there a way around this?
It wasn't necesary in schema 2000/10 i think
- Not that simple
- Great Article
2001-08-29 08:58:55 Priscilla Walmsley [Reply]
Thanks so much for this article. I am quite frustrated by so many people complaining that XML Schema is so complicated when really it is not that bad.
The article was easy to follow and also 100% accurate. I should know - I'm on the Schema WG.
Thanks,
Priscilla Walmsley
priscilla@walmsley.com
- simpleTypes not that "odd"
2001-08-25 03:50:37 Francis Norton [Reply]
you say
"It may seem odd that adding an attribute to a simple type requires the creation of a new complex type, one that has simple content to boot."
but consider that simpleTypes are used to define attributes as well as elements. Are attributes allowed to have element content or their own attributes? That, to me, explains why these are disallowed in simpleTypes.
- Understanding W3C Schema Complex Types
2001-08-24 17:55:01 Perry Molendijk [Reply]
Well I can go back to all those articles and tutorials I didn't quite understand. There is nothing like getting the basics right. Thanks a lot for this article.
- Small error
2001-08-23 22:04:24 Michael Strasser [Reply]
Complex types are much clearer now, thanks. There is a small error, however.
You wrote: "There is an abbreviated form for all complex type definitions that have complex content and restrict anyType. You simply leave out the <complexType> and <restriction base="anyType"> elements"
The last sentence should read, "You simply leave out the <complexContent> and <restriction base="anyType"> elements".
- Corrected: Small error
2001-08-24 05:52:57 Edd Dumbill [Reply]
Thanks for pointing this out, we've fixed it in the article.
- Corrected: Small error
