Text retrieval with new wrinkles: Documensa’s Edibase SGML

January 10, 1998

Seybold Report on Internet Publishing
Vol 2, No 5
January, 1998

Documensa, from Montreal, showed Edibase SGML, an SGML-based document retrieval system with some novel features. The software consists of two modules: the Indexer and the Searcher. The Indexer is used for creating textbases, the Searcher for retrieving things from them. The textbase consists of a collection of up to 850,000 documents (totaling a maximum of 2 GB), each of which conforms to the same DTD. (A wide variety of reference works would fit this framework.) The software can be used with CD-ROMs, with data on disk on a single machine, or across a network in a client-server arrangement.

In addition to the features commonly found in text search engines, Edibase SGML has some uncommon ones. There is great flexibility in how search results are displayed. For example, the order in which the "hits" are sorted can be user-defined, and the order of elements inside each hit can be changed for display. Up to 50 indexes per textbase can be defined, and each can be a combination of several elements and literal text strings. Searches can be qualified by the SGML context, and lists of results can be broken down into groups according to the value of a specified element.

Edibase SGML was developed for Documensa’s own use, and the technology has been used in a number of commercial products Documensa has created for clients. The underlying database engine is Documensa’s own. The server software runs under Windows 3.5, Windows 95 and several varieties of Unix. The client software runs under Windows (3.1 or 95) and Macintosh System 7.

Documensa plans to offer Edibase SGML as a product in the first half of 1998. There will be a GUI for the database designer. (This was not ready yet at SGML/XML ’97). The price will probably be around $10,000–15,000 for the server software, plus $350–$500 per CPU for the client. For large installations, the per user price will be lower; and the CD-ROM client can be as low as $0.75 each in large quantities.