Object Design's eXcelon 1.1
August 25, 1999
Object Design's eXcelon delivers industrial-strength storage for XML
Will XML become a full-fledged data-management discipline, as well as an exchange format? Not everyone agrees that it should. But object-database vendors, who have noticed that tree-like XML structures map nicely to the tree-like object data structures their engines can store, query, and manage, are busily repositioning their object-database products as "XML data servers." Examples include POET Software's Content Management Suite and the subject of this review, Object Design's eXcelon 1.1.
At the core of eXcelon is Object Design's highly-regarded ObjectStore 4.0, a powerful and mature object-oreiented database management system with language bindings to C++ and Java. These language bindings enable objects in C++ or Java programs to become persistent. For example, an in-memory Java object, such as a Hashtable, can be a window onto disk storage. An application that uses that object can access its parts just as though they were present in memory because, for the most part, they are. When a referenced part isn't already available, the object database automatically pages it in. One of ObjectStore's hallmark features is an aggressive "cache-forward" architecture that maximizes the in-memory performance of persistent objects, while allowing these cached objects to be shared by many such programs in a transactionally-consistent way.
Users familiar with ObjectStore can infer that it's the engine under eXcelon's hood, but the hood is shut tightly. Why? eXcelon aims squarely at a market—website content-management—that's attuned not to object-oriented programming with persistent data, but rather to a hodge-podge of tagged-text files, scripts, images, and techniques. eXcelon aims, therefore, to enable users and integrators to build XML-oriented content repositories that leverage the strengths of ObjectStore— caching, transactional integrity, and querying.
Importing data into eXcelon
The primary user interface to eXcelon's content repository is the Explorer (see Figure 1), a Win32 application that presents a filesystem-like view of the repository and enables you to import both files and (by way of an Active Data Objects interface) SQL data. Shown in the figure are pieces of Microsoft's XML auction demo, which Object Design ported to eXcelon to illustrate how the product can support a real XML project involving a collection of XML and XSL files as well as related scripts and images.
Figure 1: eXcelon's Explorer gives a file system view of the repository and let s you import files and SQL data.
eXcelon does not require or validate against DTDs. Its embedded parser checks only for well-formedness, so when you store content in eXcelon it's your responsibility to check and maintain its validity. Given the wide availability of validating parsers, it would have been nice if the product could—at least optionally—use one.
A bit more unsettling is eXcelon's slightly cavalier attitude toward well-formedness. Nothing prevents you from importing an XML file that isn't well-formed. If you do that, you won't find out there's a problem until you invoke some operation that requires parsing—such as the tree-view shown in Figure 2. Again I think that a stricter policy would be valuable, at least as an option.
Figure 2: eXcelon's Explorer's expanded tree view.
I spoke with Ben Moore, managing associate at MMA—a consultancy that has implemented an eXcelon-based intranet for Wells Fargo—and he agreed that this is an issue. He recommends using an XML editor, such as SoftQuad's XMetaL, to ensure that what you put into eXcelon is at least well-formed, if not valid. SoftQuad, he notes, has announced an interface that will enable XMetaL to read from and save to the eXcelon repository instead of a file system.
Managing data in eXcelon
Once you've gotten eXcelon to parse a chunk of valid XML, the parse tree winds up
as a set
of persistent objects in ObjectStore. What are the advantages of that? For starters,
offers a very robust implementation of XQL. Figure 3 shows the outcome of the query
//li[p]. This query, applied to the 1MB XML document that is the
manuscript of my book, asks for the first instance of a list element that contains
Figure 3: eXcelon's Explorer tree with query results.
Clearly XQL is still in flux, but eXcelon's implementation is fast, robust, and as complete as any I've seen. Chapter 6 of the eXcelon User Guide is also the best XQL tutorial I've seen.
If you're storing XML in an object database, you'd like to be able to optimize queries
defining indexes. In eXcelon you can, as shown in Figure 4. Here the class attribute
span tag is indexed. In theory that makes queries
//p/span[@class="InlineType-BookTitle"] run faster than they otherwise
would. In practice the difference wasn't noticeable on my 1MB test file—eXcelon's
caching makes everything seem pretty fast—but I expect that indexes will help
when data sets grow very large.
Figure 4: Optimizing queries by defining indexes.
eXcelon can update XML as well as query it. XQL doesn't address updating, and while XML-QL does, Object Design has instead opted to implement its own proposed method. Here is a simple eXcelon update expression:
<?xml version="1.0"?> <xlnupdate version="1.0"?> <update select="//li[p]"> <element> <p>This is the replacement paragraph.</p> </element> </update> </xlnupdate>
Here the first paragraph contained within a list element is replaced with the contents of the update expression's element tag.
The update language supports foreach, update, and remove verbs, each of which can take an optional select attribute containing XQL syntax. The query context varies as the update expression runs, so order of execution matters. You can modify nodes in place (as shown above), create new nodes, remove nodes, and even use query results from one document in the repository to update another document. You have lots of power, and plenty of rope to hang yourself. Indeed, eXcelon's lack of a validating parser, even for its own update language, prompted this scary bit of advice in Chapter 7 of the User Guide:
"When you want to create a new node, ensure that you specify the location attribute in the element, comment, text, or cdata element. It is easy to mistakenly specify it in the update element. If you do, it is well-formed XML, so the parser does not catch the error. The update facility ignores it and because there is no location attribute where it should be, the default of replace is assumed. The consequence of putting the location attribute in the wrong element is that you overwrite the current node or set of nodes."
Ouch! Even worse, I found that well-formedness is not guaranteed. The product allowed me to replace a well-formed construct with an ill-formed one. Clearly the well-formedness and validity of its own XML-based update syntax ought to be top priorities for eXcelon 1.2. Still, the update language is undeniably a powerful tool. Until a consensus standard emerges for updating XML, it's a reasonable approach that—if used with care—should deliver much value.
Next, we'll go into how you build an application with eXcelon.
Building applications with eXcelon
Nothing we've discussed so far requires programming beyond the level of XQL and update expressions. But eXcelon is, appropriately, full of hooks that can support applications layered on top of it. These applications fall into two categories: server extensions, and client applications that talk to the server and its extensions.
You can build customized clients two ways: as Web applications, and as COM-based Win32 applications. Of the two methods, the first is by far the most accessible. The reason is that eXcelon comes with a CGI adapter in the form of an ISAPI DLL, a Netscape NSAPI plug-in, or an Apache module—all for NT which is the only supported eXcelon platform. These adapters create URL-accessible Web APIs grouped into four request types:
- get, to retrieve a document from the repository
- query, to run an XQL query
- update, to run an XML update
- run, to run a server extension
The same query shown in Figure 3 can, alternatively, run straight from a browser as shown in Figure 5. Because that's so, eXcelon is also easily accessible to HTTP-aware scripting languages such as Perl or Python.
Figure 5: Browser view of a query.
Given the Web API, why would you want to write a COM-based client? These applications can wield a richer API. They can create and rearrange files and directories in the repository, and work with same management features—security, user roles, indexing, cache management—available in the eXcelon Explorer and its companion management tool, eXcelon Manager, an MMC (Microsoft Management Console) snap-in. An integrator can use the client API from Visual Basic to integrate an organization's content-management workflow with the eXcelon repository.
On the server side, eXcelon supports extensions written in COM or Java. These extensions can issue XQL queries, and use the W3C DOM API to navigate and modify resulting XML content. Server extensions are the eXcelon equivalent of SQL stored procedures. Operations that can't be expressed atomically as standard queries or updates can be made into extensions, which are then callable from clients (or from other server extensions). An example of an extension is the addbid item shown in Figure 1. It receives a bid, queries the auction data, and inserts the new bid if it's higher than the existing ones in its category.
Designing XML structures with eXcelon
eXcelon Studio, shown in the Figure 6, is a kind of entity-relationship modeller, which in this example presents a schematic view of the auction demo's XML content. The meaning of the diagram is that an item has a one-to-one relationship to a set of bids. The terminology won't make immediate sense to DTD-oriented XML designers, and in fact this diagram does not generate a DTD. Rather, it generates a Document Content Description (DCD) file. DCD was proposed to the W3C as a way to integrate the Resource Description Framework (RDF) with XML-Data.
Figure 6: eXcelon Studio's schematic view.
In addition to the DCD file, Studio can also generate code skeletons for Java and COM server extensions, and Web-based applications, that operate on instances of the schema. But all this is really just a trial balloon. eXcelon's DCD-based schema offer little of the expressiveness of a DTD, focusing almost exclusively on defining one-to-one and one-to-many relationships among elements. This makes sense, in ObjectStore terms, when you consider that a relationship among objects is a native, highly-optimized feature of the underlying engine. But Studio in its current form, untethered to current practices of XML document design and very limited in its expressiveness, seems unlikely to see much use.
XML and OODBMS: A marriage made in heaven?
I don't think anybody really knows whether, or to what extent, or exactly how a full-fledged data-management discipline should ultimately surround XML. But it's clear that there's going to be a ton of XML content, and that relational databases are not naturally attuned to the storage and management of that content.
Like a lot else about XML—querying, formatting, namespaces, schemas—the issue of native XML storage is very much in flux. That said, eXcelon appears to be a solid product that can be used today to help applications talk to XML data stores more quickly, more easily, and more safely.
Will XML data management emerge as a discipline in its own right? Will XML's high profile help bring OODBMS technology out of the niches it has so far inhabited, and into the mainstream? I'd like to see these things happen, and I applaud Object Design's first step in that direction.