Remix.run Logo
aguacaterojo 2 hours ago

I've spent a lot of time studying CRDTs & order theory in the last year & will publish an article too. Local-first apps are not easy to build, complex data structures (say, calendar repetitions with exceptions) become harder to model. Everything must converge automatically while clients can branch off.

In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely. You also can't easily upgrade your db you're stuck looking after legacy data structures indefinitely - or you have to impose arbitrary cut off points.

List CRDTs - which text CRDTs are built from are probably unavoidable except for really really simple applications. Over the last 15 years they have evolved, roughly: WOOT (paper) -> RGA (paper) -> YATA (paper) / YJS (js + later rust port) -> Peritext (paper) / Automerge (rust/js/swift) -> Loro (Rust/js). Cola (rust) is another recent one. The big libs (yjs, automerge, loro) offer full doc models.

Mostly the later ones improve on space, time & intent capture (not interleaving concurrent requests).

The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.

josephg an hour ago | parent | next [-]

> In general, you don't really get to compact tombstones meaningfully without consensus so you really are pushing at least remnants of the entire log around to each client indefinitely.

This is often much less of a problem in practice than people think. Git repositories also grow without bound, but nobody seems to really notice or care. For diamond types (my library), I lz4 compress the text content itself within the .dt files. The metadata is so small that in many cases the space savings from lz4 compression makes the resulting .dt file - including full change history - smaller than the document on disk.

If anyone really cares about this problem, there are a couple other approaches:

1. In our Eg-walker algorithm, we don't use stable guids to identify insert positions. Thats where other algorithms go wrong, because they need to store these guids indefinitely in case someone references them. For eg-walker, we just store relative positions. This makes the algorithm work more like git, where you can do shallow clones. And everything works, until you want to merge a branch which was split off earlier than your clone point. Then you should be able to download the earlier edits back to the common branch point and merge. To support merging, you also only need to store the metadata of earlier edits. You don't need the inserted content. The metadata is usually tiny (~1-4 bits per keystroke for text is common with compression.)

2. Mike Toomim's Antimatter algorithm is a distributed consensus protocol which can detect when its safe to purge the metadata for old changes. It works even in the presence of partitions in the network. (Its conservative by default - if a partition happens, the network will keep metadata around until that peer comes back, or until some timeout.)

> The same few guys (Martin Kleppman, Kevin Jahns, Joseph Gentle, probably others) pop up all over the more recent optimisations.

Alex Good is another. He's behind a lot of the huge improvements to automerge over the last few years. He's wicked smart.

aguacaterojo 3 minutes ago | parent [-]

The man himself. Yeah agreed you guys have solved it. It is more a misfired crud dev instinct for me that sees it as wasteful. Just a different paradigm and not big in practice.

I've got eg-walker & Diamond Types in my reading/youtube backlog. Diamond Types went further down the backlog because of "wip" on the repo! I will look into Antimatter too.

fellowniusmonk an hour ago | parent | prev [-]

I love diamond types personally, which I think Loro uses in certain cases. I find not only is DT fast under normal usage, allows for easy archiving and retrieval of past data but it also quickly outputs a per user casuality rendering that LLMs can pretty easily rectify in most cases. Compact and archival to flat files works very well.