SirixDB - a temporal storage system, which creates a new snapshot on each transaction commit
December 27, 2018
Submitted by Johannes Lichtenberger.
I'm developing an open source storage system for versioning data at the subfile level, especially well suited for SSDs due to its log-structured copy-on-write (COW) nature and never overriding data. The core is implemented in Java.
Key features are:
- A novel versioning algorithm called sliding snapshot (along with other versioning algorithms).
- A diff-algorithm which makes use of our stable record-identifiers and optionally hashes, another diff algorithm for importing similar XML-documents as a versioned Sirix-resource.
- Novel XPath axis to navigate not only in space, but also in time.
- Versioned index structures.
- Storage layout, which is especially well suited for SSDs.
- Asynchronous, RESTful API built with Vert.x and Kotlin.
- Logarithmic storage and retrieval time complexity (O(log n)).
We offer a rather low level record-storage. On top of that a node transaction layer to store and retrieve DOM-like nodes from persistent storage (or a buffer manager, which buffers page-fragments and nodes therein in-memory). Another DOM-like layer is used as the binding to Brackit, an XQuery query engine, whereas references are stored on the heap.
Recently, I've implemented a higher level, asynchronous REST-API with Kotlin (Coroutines) and Vert.x in another separate module. The system is heavily inspired by the filesystem ZFS. My goal is to put forth the idea of a versioned, distributed storage system to easily support temporal analytical tasks, which are best applied to a series of revisions in order to analyse how the data has changes. Other tasks might simply include easy undo/redo operations.