Menu

Inside OpenMLS's XML Application

August 12, 1998

Lisa Rein and Tim Bray

If you had a chance to look at the demo, you'll see that it starts off with an HTML-based form interface. The form allows the buyer to fine-tune searches that are more likely to deliver valuable real estate results. On the backend is the XSearch! engine, which is able to process XML on the server-side.

"The path to standardization in the real estate industry will be taken with small steps."
-- John Petit

XSearch is able to deliver XML or serve up its results in HTML 3.0 so it will function properly in Netscape or IE version 3 or above. No scripts are needed for the Xsearch! implementation. Although the online demo uses Active Server pages, the implementation is not tied to them in any way.

The OpenMLS system is deployed on the Microsoft Backoffice Suite, while the online demo uses Microsoft's Index Server and the (validating) MS xmldso Java parser. See The CodeThe actual XML that is pushed to the web for crawling and indexing could be generated any number of ways and implemented in any number of systems. The trick is using a standardized XML-based vocabulary to make the listings interoperable with each other. That's what the RELML DTD provides.

The OpenMLS Listing Management System, also requires Internet Explorer 4.01 or higher, which uses a test for browsers of lower versions and redirects accordingly, and uses the msxml parser that ships with IE4.01 (the very same as that used on the server-side only implementation).

The system generates both an XML page and an HTML 3.0 Web page for each listing as it is entered. The OpenMLS product immediately provides a real estate office with a World Wide Web presence for every listing. If a listing is modified, new XML and HTML documents automatically replace the outdated pages. Consumers view pages on the web with the most current information available.

The Real Estate Listing Markup Language (RELML) DTD

Both the XSearch! and the OpenMLS Listing Management system use the Real Estate Listing Markup Language, or RELML, to markup, aggregate, and index their listings.

The OpenMLS server uses DTDs to validate the listings entered by the listing agents. When the agents enter the listing data, the input is sent to SQL Server; data is read from SQL Server and formatted as XML (& HTML).

Open MLS also needed their application to work with several different ways that listings might be produced: users making MLS entries with custom MLS software, those making MLS entries with standard XML editors, and those who are simply making a web site.

Using a DTD to associate a particular vertical market's meaning to a document's content turned out to be a tricky business indeed.

"In developing the DTD, I had to determine the common denominator for the various MLS forms across the country. It was a balance between the general and the specific, and I tended towards the general," John Petit, principal designer of the RELML DTD, explains. "I feel that the path to standardization in the real estate industry will be taken with small steps. If the proposed standards are too precise or restrictive, there will be too many points of contention."

Crawling and Searching

Another common point of confusion is the differences between "crawling" and "searching" and how each of these functions plays a very specific role in the implementation.

"The distinguishing factor that differentiates 'crawling' from 'searching' is that the search is performed by a separate program on the database that has been gathered by the crawler," Petit explains.   "More than likely, there will be one crawler that will index both HTML and XML but put them into separate indexes. The search is performed on these centralized indexes."

The search engine will crawl for everything; however, during a search, the engine must look for the <DOCTYPE> tag that references the vertical market DTD, then the search is refined to that market.   So although the HTML and XML can be crawled and indexed together, they will be searched separately.

When search engines crawl the Web looking for XML based documents, there still may be issues regarding the return of precise information, primarily because various vertical markets could be using the same XML tag definitions that imply different meanings.

For example, <BRAND> in one industry could mean something completely different in another. In order to eliminate this confusion and to allow for searching a specific vertical market, the engine must identify and index the appropriate external DTD reference.

For the consumer, the typical form-driven interface can be provided for searching vertical markets. There are several methods to accomplish this, one being the traditional use of text boxes or list-boxes, prompting for specific information.  In this scenario,  if the consumer clicks "3" for number of bedrooms, the web server will interpret this and send the following string to the index server:

<NUM-BEDS>3</NUM-BEDS>

The index server will return only those documents that contain this literal string.  Another method is to use "free-form" text for input (which is what HTML necessitates), but that could lead to ambiguous information being returned, simply because this data is not tied specifically to tags. For example, entering "3" can return  3 bedrooms, 3 bathrooms or 3 pit bulls that live next door.

"This DTD, if widely adopted, is one small step. Certainly there is more to add such as strong data typing and internationalization and 4thWORLD will continue to refine the real estate DTD as the market and changing technologies dictate.   Indeed,  we may opt for better document description syntax such as those provided by XML-Data or DCD (Document Content Description) syntax, when these specifications have been fully standardized through the proper W3C channels."

"As it stands, the current DTD is quite useful, as the OpenMLS software demonstrates. I also had to show how the XML documents conforming to this DTD could be customizable so that  various Boards of Realtors could create their own MLS form (the OpenMLS software demonstrates how this works).  All of our efforts are to ensure that real estate sites developed by 4thWORLD are seamlessly integrated with the workings of real estate agents and that the pertinent information is universally accessible to buyers on the web."

Take a look at the RELML DTD,
and some sample XML output.