Remix.run Logo
thom 5 hours ago

This is light on specifics, but is still directionally the closest anyone has come to describing my ideal future data platform. The founder Dan Sotolongo's work at Snowflake included Dynamic Tables which makes me think the proposal is basically this: a hybrid transactional/analytical database, with declarative domain-oriented schemas, with smart incrementally calculated updates, and a la carte performance characteristics (i.e. opt into row or columnar storage, or whatever indexing you like). Nathan Marz's Rama is also close to this vision but perhaps a little less accessible for many enterprises as it's more purely developer focused. The win here is that you have a single system with all your data, all the time, and can build on it however you like. Piecing complex new services together across multiple microservices is deeply painful - if you can give me a single smart platform for all my data, and make it work operationally, that feels like a big win.

Obviously a lot of this you can piece together today, in fact Snowflake itself does a lot of it. But the other part of the article makes me think they understand the even harder part of the problem in modern enterprises, which is that nobody has a clear view of the model they're operating under, and how it interacts with parts of the business. It takes insane foresight and discipline to keep these things coherent, and the moment you are trying to integrate new acquisitions with different models you're in a world of pain. If you can create a layer to make all of this explicit - the models, the responsibilities, the interactions, and the incompatibilities that may already exist, then mediate the chaos with some sort of AI handholding layer (because domain experts and disciplined engineers aren't always going to be around to resolve ambiguities), then you can solve both a huge technical problem but a much more complicated ecological one.

Anyway, whatever they're working on, I think this is the exact area you should focus on if you want to transform modern enterprise data stacks. Throwing AI at existing heterogenous systems and complex tech stacks might work, but building from scratch on a system that enforces cohesion while maintaining agility feels like it's going to win out in the end. Excited to see what they come up with!

jkhdigital 3 hours ago | parent [-]

Sotolongo's lineage is Twitter observability -> Google streaming -> Snowflake Dynamic Tables, which is a declarative, relational, query-optimizer-centric tradition. Marz's lineage is Storm -> Trident -> Rama, which is a procedural-dataflow, programmer-controls-the-plan, event-sourcing-centric tradition. Both are trying to unify OLTP + OLAP + application logic + reactivity into a coherent substrate, but they're coming at it from opposite epistemological poles. Rama says "give the programmer fine-grained control over partitioning, indexing, and dataflow, and trust them to design the right physical representation for their queries." Cambra, if your inference about Dynamic Tables is right, will almost certainly say "let the programmer describe the domain model declaratively and let the system figure out the physical representation." This is the classic Codd-vs-Codasyl split, recapitulated forty years later with much more sophisticated machinery on both sides.

If this is the right framing, then the two systems aren't really competitors despite solving the same problem--they're going to appeal to fundamentally different developer sensibilities. Rama is for people who want to think like Jay Kreps or Martin Kleppmann: the event log is sacred, physical data layout is a first-class design decision, and the programmer earns the performance benefits by understanding the system deeply. Cambra (if these assumptions hold) will be for people who want to think like database users: describe what you want, let the optimizer figure out how, intervene only when necessary. These are both defensible positions and both have historical track records of working. SQL's history shows the declarative camp has ecosystem advantages once the optimizer is good enough; Kafka/Rama's history shows the log-centric camp has correctness and observability advantages for event-heavy domains.