April 28, 2022
Submitted by Ghislain Fourny.
We are happy to announce that RumbleDB 1.18 beta is out.
With RumbleDB you can query terabytes of messy data in all kinds of formats (CSV, JSON, Parquet, text…) sitting in many places (S3, HDFS, your laptop…) and feed it right into machine learning pipelines.
As always it comes with many bugfixes and stability improvements, but most importantly the number of ways to install it (free and open source) keeps growing for more simplicity and flexibility:
- as a standalone jar, executable with a java command
- as a small jar in an existing Spark local installation or a Spark cluster (Amazon EMR…), executable with a spark-submit command
- with a brew tap (we are waiting for the brew team to react to our listing request, thank you muchly to all of you who supported us in this endeavor with your +1s) that installs a “rumbledb” command
- with docker
This is all documented here: https://rumble.readthedocs.io/en/latest/Getting%20started/
We also have a brand new Python dependency for easily connecting from a Jupyter notebook to either an instance of the “simple" RumbleDB server, or to a large cluster (more robust) via the Apache Livy interface. https://pypi.org/project/rumbledb/
If you go to the live JSONiq tutorial, you will see how with this Python dependency it now only takes three lines in a fresh notebook to connect to either our public sandbox (for playing around), or your own server: https://colab.research.google.com/github/RumbleDB/rumble/blob/master/RumbleSandbox.ipynb
And above all, it keeps being free and open source. Enjoy!