XML, the Web, and Beyond

November 10, 2004

Welcome to this week's column, in which I'm excited to be able to tell you about changes in prospect for next year's XML Europe conference, and report on a discussion about when multiple schemas for XML documents should be used.

XTech 2005: The New Face of XML Europe

Over the past few years, I have had the pleasure of being the program chair for the XML Europe conference. The elder sister of the main U.S. event, XML Europe has for years been an integral part of the XML community and, before that, the SGML community.

Following the changing face of XML's development, the conference has itself evolved and changed. This year I'm introducing further changes to keep pace with XML's usage and development. It has long been held that XML is "the ASCII of the future," and that is increasingly being realized. As such, the use of XML has broadened to the point where it is no longer the sole or dominant technical focus in such applications.

For a conference chair, this can present some problems. There are many compelling areas where XML technology is vital that do not see themselves as part of the core XML world. Yet many XML practitioners find themselves working in these areas, and many more application areas would benefit from an influx of skilled XML professionals. A conference remaining focused on the core will become startlingly dull, yet there is the risk of pleasing nobody by putting together too broad a program.

With the coming year's conference, I've decided to address this by adding additional topics that are very strongly related to XML, yet also have their own communities and concerns. While two tracks, Core Technologies and Applications, will continue very much to be the home of XML technology, two new tracks will expands the conference's horizons.

The first of these new tracks is Browser Technology. The development of standards-based user interfaces is definitely very much alive and kicking. Mozilla's Firefox browser is starting to take chunks out of Microsoft Internet Explorer's dominance, and at the same time provides a rich client development platform. Opera's browser is innovating on mobile platforms, as is Apple's Safari browser on the Mac. Meanwhile, Microsoft's XAML provides a new avenue of competition and innovation.

From the standards point of view, W3C's XForms is of pressing interest to many, including governments and large corporations. W3C's Compound Document Format group is trying to solve the problem of mixed-media XML documents. And the WHATWG group is seeking to provide an alternative to the W3C for moving web markup forward.

The second new track will be called Open Data. An increasing number of information owners are choosing to expose their data on the Web. Opening up data encourages its creative reuse, empowers citizens, and can create new commercial opportunities. Along with governments, commercial organizations and content owners such as Amazon, Google, and the BBC are experimenting with open data. At an individual level, exciting open data developments are happening through movements such as blogging and social networking.

The Open Data track will address concerns at all levels, from business and policy through to implementation, and cover topics such as open government, business models and deployment issues for public-facing web services, open access to scientific data, licensing and intellectual property concerns, blogging and personal content, and the Semantic Web.

I hope you'll agree that these two new tracks are not only very relevant for many XML practitioners, but focus on some of the most exciting ways in which XML and the Web are currently developing. And, or so I've learned from conversations with my successor, Kendall Clark, they're also similar to the new content directions XML.com is exploring.

To reflect the change in the conference composition, we're changing the name from XML Europe to XTech 2005. This hopefully still carries the connotation of XML, but also the message of broader inclusion of extensible technologies and their applications. Our subtitle, XML, the Web and Beyond, echoes this.

XTech will remain as committed as ever to the XML community. My great hope for the conference is to foster greater communication and exchange of skills and ideas with the neighboring communities who are building on the foundation of XML and the Web.

The XTech 2005 call for participation opens this week. To find out more about how you can be involved, visit the XTech web site.

There is No One True Schema

And now back to the XML mailing lists. Illuminating light was shed on the question on how to meet many potential needs with a single schema, thanks to a question from Alison Bloodworth. Bloodworth writes:

I am creating a schema for a university "Event" that will be used by the event calendars on the University of California, Berkeley campus. However, the idea is to design a schema that can be reused by any university, so I'd like to make the schema as flexible and extensible as possible.

One question I'm struggling with is whether to make most of the elements optional. This would result in the most flexible schema, but it has been suggested that having nearly everything optional will not result in a helpful model of an event unless the schema is restricted for a particular university (e.g. using the same elements, but making some required). However, doing restriction is rather messy in that you have to basically rewrite the whole schema.

In response, Michael Kay says that there doesn't have to be a single schema. Instead, he suggests using a meta-schema and transformations to generate the required target schema.

The right solution here might be a meta-schema that can be transformed (using a set of input parameters) into the target schema you actually use for validation.

Elsewhere on the mailing list, Burak Emir was questioning the need to write schema documents in XML itself, and the value of dynamically generated schemas. Michael Kay responded, tying in his conversation with Bloodworth, and refuting Emir's assertion that "The whole point of schemas is to be a widespread, well-understood description of instances."

It seems entirely legitimate to me to apply different schemas to the same document at different stages of a workflow, or for senders of documents to apply stronger validation criteria than recipients of the same documents.

Emir responded by asking whether Kay's response actually necessitates the use of dynamic schemas. He also added:

I think the very fact that somebody can write a mechanical transform to generate one schema from another hints at enough anticipation of requirements that the original schema could have been written in an extensible way in the first place.

His latter point is, I think, refuted well enough by the troubles Bloodworth was having using W3C XML Schema to do just that. Kay elaborated on why dynamically generated schemas are useful:

If you can think of a better way of maintaining several schemas that are identical in most respects, but vary in terms of which elements/attributes are optional, then let me know.

This is a classic "conditional compilation" scenario, and for languages based on XML, XSLT provides a powerful tool for such use cases.

Peter Hunsberger explained to the list why he uses dynamically generated schemas.

In our cases, we have a lot of metadata described in a relational database. There are customizations of that metadata that select specific pieces based on the authorizations of the user and the usage context of the metadata. The only time we need a schema is for the description of a piece of instance data that is travelling beyond the boundaries of the system, so we generate the schema as we need it.

Hunsberger goes on to acknowledge that his situation "may sound like a problem of not having a powerful enough schema language, and in a way, it is." This is a fair summary of the problem. It may well be that any schema language sufficiently powerful to provide multiple descriptions of the data according to given parameters will turn out to be a horrendously complex specification. This requirement sits on the 20 side of the 80/20 rule and is neatly sidestepped by the relative ease of generating custom schemas.

Also in XML-Deviant

The More Things Change

Births, Deaths, and Marriages

Here are the latest announcements from the XML-DEV mailing list. This week I've thrown in a couple from the W3C's RDF interest list too, as news has been slow on XML-DEV.

eXist 1.0 beta 2: eXist is an open source native XML database, featuring index-based XQuery processing, XUpdate support, and much more. This release benefits from a lot of testing done by other projects, and fixes many instabilities and database corruptions that were still present in the previous version.
Redland RDF Application Framework 0.9.19: Redland is a C library that provides a high-level interface for the Resource Description Framework (RDF) allowing the RDF graph to be parsed, serialized, stored, queried, and manipulated. It comes with bindings for C#, Java, Perl, PHP, Python, Ruby, and Tcl. Changes include support for SPARQL RDF query and a move to LGPL/Apache licensing.
NG4J: Named Graphs API for Jena: NG4J is an extension to the Jena Semantic Web framework for parsing, manipulating, and serializing sets of Named Graphs. This release includes TriQL -- a query language for extracting information from sets of Named Graphs, based on RDQL (RDF Query Language).
XML 2004 Program Expands: Adds an Atom Hackathon and OASIS interoperability demonstrations. Late-breaking presentations include one from your humble correspondent.

Scrapings

Sony patents its own Semantic Web killer, right ... the web is too much like Canada ... Michael Kay agrees ... but will it soon be as full of disaffected Democrats? ... 146 messages to XML-DEV last week, seven percent Bauer flamebait ... much still to be done to clear DTDs' bad name ... next week your Deviant will be served fresh from XML 2004, complete with "humorous Tim Bray keynote" ... see you there.