▲ | yencabulator 5 days ago | |
I'm writing a brand-new OLTP database with Rust, DataFusion and RocksDB (to be replaced with something in pure Rust later, hopefully). It's still very early days, but I'm soo close to finally being willing to share it. Close enough to Postgres that apps think it's Postgres, but also runnable as a library like SQLite -- best of both worlds! And then if you're willing to not limit yourself to Postgres compatibility, you also get fancy new technologies like unsigned integers in a database! (Old SQL ecosystem is sometimes ridiculous.) My personal journey goes something like this: I've always suffered from both SQLite and Postgres idiosyncrasies, and almost always when deploying I've wanted to start out small, not have a big dedicated database server, and have meaningful tests that don't have multi-gigabyte dependencies and runtime assumptions. The idea of having something "close enough to Postgres to not have to learn much new" database with the low-end abilities of SQLite is something I've been wanting for roughly as long as I've known about SQLite -- even more so if it could also replace Postgres and remove the fear of differences between dev/early-stage vs later. Much later, I learned about the newly-fashionable OLAP-over-object-store architectures, and I learned about Parquet. That lead to discovering Arrow and DataFusion. Arrow is an in-memory data format intended to be a standard interchange layer. It's basically array per column, which isn't exactly point-query oriented but helps make modern-day CPUs happy; quite well aligned with SIMD processing. DataFusion is a Rust framework that's essentially a query engine, and it has a decent query planner (arguably the hardest part of writing a database). RocksDB supports transactions and does MVCC, which is probably the second hardest part of writing a database. The rest just fell in place: sqlparser-rs is a Rust SQL syntax parser with Postgres etc compatibility nicely worked out. pgwire implements the Postgres wire protocol. Non-legacy clients can use FlightSQL and Arrow IPC for faster data transfer (Postgres wire protocol kinda sucks, it's that old). In-process use from Rust is darn trivial with DataFusion, and other languages can be dealt with by writing a C bridge -- once again, Arrow is an inter-language standard already, so all we need to do is to shove the result data buffers over to a more native "dataframe" library. It looks like I can actually glue these things together with less than a decade of effort! There's lot to still worry about, but I'm feeling pretty positive about the project. And if and when I get to replace RocksDB with a pure-Rust data store that has all the right bells and whistles (in-house or not), the end result will be pure Rust, and aligned for modern world of NVMe, io_uring, and what not. That's a world I definitely want to live in. Current status: Getting rid of the last `todo!()`s, unwraps etc that would distract from the "look at how robust this thing is" Rust evangelism too much. I need to put in stress tests and fault injection and make sure I'm configuring RocksDB right for transaction isolation and disk persistence. There's tons of missing features, but very few bugs-as-such (0 known that aren't about C++ integration), and missing features all return a decent explanatory error message instead of eating data. The darn thing already works as a SQL database -- largely because it's just DataFusion's query engine and me feeding it table scans. I wrote a SQL database without ever debugging a JOIN! The shortcuts I've been able to take due to help from preexisting projects are huge. For someone who grew up in the world of "every C project has to write basic data structures for themselves because C isn't very modular", it's downright amazing! |