Using Topic Maps to Extend Relational Databases
March 5, 2003
Relational databases are great for storing structured data which conforms to a well-defined relational database schema. They are not so good at storing information that does not conform to such a schema. Since user requirements inevitably change, this means costly database upgrades.
To avoid too many such upgrades, in a larger database one often sees provisions which allow users to add relevant information which does not fit nicely in the database schema. The lowest-level extension is a free-text field in a record which allows users to add textual data. Another common extension is the introduction of user-defined value lists.
While such mechanisms are powerful, they are not standardized and do not easily allow the interchange of data. Topic Maps provide a very flexible and robust way to add arbitrary data to a relational databases at runtime. Moreover, Topic Maps come with a predefined exchange mechanism (the XML Topic Maps (XTM) interchange syntax) to allow data to be exported to XML.
A Sample Database
In this article I will use as an example a very simple relational database containing two tables, Employee and Department. Every Employee belongs to a single Department. A Department has a manager, who is an Employee. (I've put the sample Access database online, see references below.)
Here's a basic screen for manipulating the database tables:
Now suppose this is all we have come up with when the database was designed, and we want to enable users to add data which does not fit in the database schema without requesting a database upgrade. For instance, a user finds the need to add countries to the database and assign nationalities to Employees or add courses and assign course enrollments to Employees. Topic Maps provide a powerful paradigm whereby users can add such data at runtime, not design time.
Topics: New Things in the Database
First some Topic Map tables have to be added. I will use a simplified subset of the full Topic Map capabilities. First we will add a table "topic" and a table "topic-type".
Both tables have a primary key which is an unique number. (I will use numerical artificial primary keys in all tables.) The "instanceOf" attribute of "topic" is a foreign key which indicates the topic type of a topic. Topic Maps allow topics to have multiple types; single inheritance will do for the purposes of this example. This enables the user to record topic type "Course" in the "topic-type" table and topics such as "TOPICMAPS", "XML" and "ONTOLOGY" in table "topic". The following screen enables the user to add topic types and corresponding topics.
This form also shows an application of topic names, display names, and sort names. Topic Maps allow many names, display names, and sort names for each topic; a single one of each will do here. Multiple names make sense in multiple-language applications. The solution would be to store names in a separate table and have "id_topic" as a foreign key in this table. Every topic must have a topic type in this approach. Topic Maps do not require this, but it is good design practice. In Topic Maps topic types are topics themselves. I have created a separate table for topic types, which makes it easier to build a good user interface. Keep in mind, however, that the topic table does not include all topics, since the topic types, which are topics, are stored in a table of their own.
If we want to assign nationalities to employees, Topic Maps provide associations to do this. Associations are relations between two or more topics. Before we can assign nationalities to employees, the employees themselves have to become topics. For this purpose we need a view on all topics in the database. The following SQL code does this:
SELECT topic_type.name & '_' & topic.id_topic AS id, topic_type.name AS type, topic.name AS name FROM topic, topic_type WHERE topic.instanceOf = topic_type.id_topic_type UNION SELECT 'TopicType_' & id_topic_type, 'Topic Type', name FROM topic_type UNION SELECT 'Employee_' & id_employee, 'Employee', employee.last_name & ", " & employee.first_name FROM employee;
This code yields the following result view:
All topics get a unique id which consists of the topic type, an underscore, and the primary key of its table. This will come in handy later when we create an XTM file from the tables. Whether the departments would have to be made topics depends on the expectations of the designers: will end-users need to store facts about departments? Clearly performance issues come into play here. In a large database a view which collects all rows from all tables will bring any RDBMS to its knees. It would be prudent to limit topics to tables which are central entities from a user point of view. Further, it makes sense to restrict access to the view of all topics for end-users, so that only a subset of all topics can be retrieved at a time.
Associations: Weaving Topics Together
Now we have employees as topics, it is time for the associations. The following tables implement associations:
An association type in Topic Maps is a also topic. Therefore the SQL code above will have to be adapted to include a UNION clause for association types. In Topic Maps roles belong to associations, not association types. However, storing the roles with the association type leads to a clean design.
For example, an association type "Marriage" will have as role types "Husband" and "Wife"; "Enrollment" will have "Student" and "Course"; "Nationality" will have "Citizen" and "Country". An association which is an instance of "Marriage" will connect two topics. These topics are indicated by the "member_1" and "member_2" attributes. Those attributes will point to two topics; for instance, employee "John Smith", who plays the "Husband" role, and "Susan Smith", who plays the "Wife" role. Topic Maps allow associations to have any number of members. Here we have only implemented binary associations, which cover most use cases. Once the user has entered appropriate association types and roles, the following form shows how the associations are entered and displayed for an employee.
The top fields in the form are the regular fields from the Employee table. The associations follow the regular fields. Type and roles are shown from the association_type table, members from the association table. Once a user enters an appropriate association type, such as "Enrollment", in the last row, the application can fill in the roles from the association type and the first member (being the employee whose form is displayed). All the user has to do is enter an appropriate second member.
Occurrences: Information Inside and Outside
Associations between topics are not the only type of data which can be stored in relational databases using Topic Map techniques. In the lower half of the form another Topic Map construct appears. Occurrences provide a powerful way to link topics within the database, such as employees, to documents elsewhere. Suppose for instance a user wants to relate a file "C:\reviews\Annual performance review Susan Smith 2002", or a site "http://www.susansmith.com" to Susan Smith. The following two tables support the addition of this kind of data:
An occurrence is an instance of an occurrence type. Like association types, occurrence types themselves are topics. Topic Maps support two different kind of occurrence: references to external resources, such as a website or file, and internal resources, which are character data elements in the Topic Map itself. This has been implemented by adding a "datatype" attribute to the occurrence_type table. When the value of this attribute is "URI", the resource attribute of the corresponding occurrences is interpreted as a URI pointing to an external resource, otherwise the resource attribute contains the character data which is the occurrence.
For external resources, the name of the occurrence type could be "Home page", the datatype "URI", and the value of the resource attribute a URI such as "http://www.susansmith.nl". For an occurrence type "Children", the value of the resource attribute will be the number of children of the employee. The solution with a datatype attribute can also be used to add a simple typing system to internal resources. The datatype for the "Children" occurrence type could be "natural number", and the application could check whether the value entered by the user is indeed a natural number. Topic maps themselves do not yet provide a standard way of specifying data types.
Export and Merging
With these few extra tables relational databases can be extended to support the addition of data that is not supported by the database schema. Notwithstanding this flexibility, the data added itself is highly structured through the use of topic types, association and occurrence types. And there is a bonus: the data in the database can easily be exported as an XTM file, providing a standardized interchange syntax.
The employees have already been defined as topics themselves, so employees will be exported to the XTM Topic Map. The attributes in the employee table can be exported as occurrences: the salary attribute can become an occurrence of type "Salary" with "10.000" as its value. Alternatively, an occurrence could be made which points to the record of the employee in the database, especially when the database interface is web-enabled. The sample database contains demo code to export data to a XTM file.
Once the data is exported to an XTM file, Topic Map applications can import the data. At this stage another strength of Topic Maps can be used: the ability to merge Topic Maps from different sources into a single Topic Map. When not only the employee database from the HRM department contains data about employees, but also the payroll database of the financial department, both databases can export their data as Topic Maps. After merging, information from both Topic Maps will be available.
Some Final thoughts
Some Topic Map features have not been implemented in this approach. Multiple inheritance of topics and n-ary associations have already been mentioned. Also scope and subject identity have not been implemented. Those features are not very hard to add.
Subject identity is very useful when you want to merge the exported Topic Map with other Topic Maps. The design shown here has been kept deliberately simple by making topic types, association types and occurrence types separate tables. (In a regular Topic Map, all those types are just topics which happen to have a topic, association or occurrence as an instance. By looking at a single topic, one cannot know whether it is a type or not.) Since all three types are topics themselves, they could be made generalizations of topics. This would enable true class hierarchies of topic types. The following figure shows this alternative design.
This approach is so powerful one might wonder whether all data could be stored as topics, associations, and occurrences. Well, it could, but it shouldn't. A Topic Map approach will never yield the same performance as data in tailor-made relational tables. The approach sketched above is not suitable for performance-critical, high-volume data. So when a particular data item such as "Nationality" has to be stored for each employee, making it an attribute in the employee table will be the preferred route. When data is of a more incidental nature, when only the nationality of foreign employees is relevant, making nationality an association and countries topics will work fine.
This approach has another huge benefit. Every database administrator has experienced users requesting a new functionality, to find when it has been added it is not used because business requirements have changed. When the new functionality is added with topics, associations, and occurrences, users have the functionality right away. When the new functionality is frequently used, it can be implemented in regular relational tables and attributes. So the Topic Map approach can function as a testbed for new functionality. Since the data is structured conversion of the topics, associations and occurrences to regular relational data is a relatively straightforward exercise.
To summarize, the Topic Map paradigm provides a powerful way to add data to a relational database at runtime in a very flexible and powerful way. Topic Maps provide an excellent technique to overcome the natural limitations of relational databases: the constraining nature of the database schema. An added bonus is the ability to export the data as an XTM file, thus enabling interchange with other Topic Maps.
Thanks to Steve Pepper for reviewing and providing useful suggestions.
- The sample database with demo XTM generation code is at http://www.marcdegraauw.com/tmextension/
- For an excellent introduction to Topic Map concepts, see Steve Pepper's The TAO of Topic Maps
- For another introduction which focuses more on the technical side, including the XTM interchange format, see Lars Marius Garshol's What are Topic Maps?
- The XML Topic Maps (XTM) 1.0 specification defines the XTM Topic Map XML-based interchange syntax.