XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Using XML for Object Persistence

September 08, 1999

Applied XML Tutorial


Contents

Part 1: Using XML for Object Persistence
Part 2: Serialization Problems
Part 3: Roll Your Own Generic XML Data Format

What does "object persistence" or "serializing an object" mean and how can XML help with it? Several technologies out there try to assist you in serializing objects into XML strings. They deal with Java, CORBA and COM-objects, so you should take a look at them. 

But right now, if you don't yet feel comfortable diving into pages and pages of documentation, stay with me. I'll take you on a tour through object persistence, so you'll know what to expect from the already available technologies. You'll also learn how to implement some persistence strategy yourself.

The article is accompanied by an XML schema and sample code written in Visual Basic 6.0 using Microsoft's MSXML component.

Object Persistence = Storing an Object's Data

Most of you are familiar with using objects in your programs. Often you're using third party objects like an XML parser object or a database API wrapped in an object model, such as Microsoft´s ActiveX Data Objects (ADO). Or you are setting up an object model in your application yourself, like Microsoft Word does. Your object model can be viewed as the ideal representation of your application's data in memory. Maybe you are programming the next 3D killer application and your object model looks like the one in Figure 1.


Figure 1: A simple object model.

An object model like this works very well as long as the application is running. But what do you do, when the user wants to close the program? You have to store your objects. Or, to be more precise, you have to store the data of your objects somewhere, such as in a file or in a database. While objects live in memory, data and code (object methods) stay together in "little boxes" (the objects). But when you store an object, you store only the data.

Later, you might create a new, empty object and load the data previously stored by another object. The data is thereby again associated with code.

When storing an object, you separate data from code. Object persistence is all about extracting the information in an object so it is not lost when the object itself is destroyed. Once the data is separated from the object, it can be saved in a file or sent over the Internet to some other computer. Sometimes, objects are supposed to not only store their data but also their code. That's cool, too, and has its uses (for example, in mobile agent scenarios), but we won't discuss that in this article.

Serializing an Object's Data

You may not have ever realized it, but object persistence is something you have already been working with all along. Whenever you've saved an object's data you made it persistent. Maybe you have a customer object in your application. Every time you issue an SQL Update statement to your RDBMS to store the customer's data, you're persisting the object's data.

When serializing an object, however, the focus is not so much on storing an object's data on non-volatile media, but on how the in-memory data structure of an object differs from how the data looks once it has been extracted from the object. Figure 2 shows you the difference.


Figure 2: Serializing an object's data

In memory, the data is located at arbitrary addresses which you can think of as arrays, structures, objects, and so on. But those data structures cannot be stored directly. You can only store data with simple types, such as integers, floating point numbers or strings. An array of strings has to be broken up into its parts which are of a simple type. Objects, as another type of complex data structure or container of other data structures, cannot simply be stored either. So we have to break them up into their data parts (properties) and store them individually.

Now, when you want to store several data items in a file, you put them one after the other. A complex data structure -- for example, a multidimensional array -- thus gets written out one array cell at a time, one after the other. That's what serialization means. When serialization takes place, the simple data types in complex data structures get lined up like pearls on a string. Look at Figure 2 and notice how the serialized data items are listed in the string on the right side. Quite literally the serialized form of an object is the one-dimensional representation of its (potentially very) complex data, including information on how the data originally was "arranged" in-memory (for example, in arrays or user-defined structures). This information is needed later, when you want to read the data back into some other object.

The serialization code for the object in Figure 2 could look like this (all code in this column is Visual Basic; it should be so simple to read that even if you program in a different language, you should be able to understand it):

Function Serialize() as String
    Dim s as String

    s = "<"
        s = s & "<" & a & ">"
        s = s & "<" & pi & ">"
        s = s & "<" & msg & ">"
        s = s & "<"
            s = s & "<" & myarray(0) & ">"
            s = s & "<" & myarray(1) & ">"
            s = s & "<" & myarray(2) & ">"
        s = s & ">"
    s = s & ">"

    Serialize = s
End Sub

The characters in the string are the serialized representation of the object's data. We could also have stored the data in a database table; that would have been just another persistence medium, but since there the data would not really be stored in a one-dimensional fashion, we would not really have called it serialization.

Note the implications of having an object's data transformed into this kind of serialized form (though not necessarily a string): It can be stored in a file, in a field in a database table, or can be sent over the Internet. It's a byte array, a string, it's easy to handle in many ways on all kinds of platforms.

You can also think of the serialized data as a dehydrated object. All the water (the code, in this metaphor) has been pressed out of the object, leaving only salt and minerals (the data). Later, when you want to get back the whole object, you rehydrate it. You add water to the serialized data by creating an empty object and deserializing the data.

Pages: 1, 2, 3

Next Pagearrow