Remix.run Logo
dtj1123 6 hours ago

I'm also curious about this. I don't understand how deletion and modification can be made commutative operations in a way that makes sense

Groxx 5 hours ago | parent | next [-]

tombstones are sorta the default answer here (i.e. at simplest, you keep all data forever so you can merge correctly, but you hide anything where you've seen a tombstone after it).

but "makes sense" and ways to optimize that can change massively with context. e.g. for a chat app, as soon as you see "deleted message X", you can reasonably drop X and all past and future changes to X because they won't be shown by anyone (don't even need to sync them). if you do that with "deleted chars 87..93" in a text editor, past-edits that you receive in the future might affect the behavior (it might add chars before those, changing what that range means), so you can't simply forget those chars (e.g. an easy option is to replay all events that occur after an event syncs, but that means retaining all events forever). the semantics you choose and what you do with the data affect your outcomes a lot.

tbh this is one of the reasons I like the idea of a WASM-defined algorithm. no one algorithm will be "best" for all data, and the storage/computation/transmission savings can be extreme.

sanity 5 hours ago | parent [-]

Exactly, well put.

cassonmars 6 hours ago | parent | prev | next [-]

For a basic CRDT set, merge rules have to have some kind of temporality basis in the messages such that commutativity is preserved. usually it's a timestamp, sometimes it's an unforgeable value like a hash, e.g. A: { "prev_hash": null, "content": "foobar" } B: { "prev_hash": "<hash of A>", "content": "foobarbaz" } C: { "prev_hash": "<hash of B>", "content": "foobaz" }

and when played out of order, it's guaranteed to resolve to foobaz eventually or immediately, depending on when messages are received

when you encounter the scenario of a fork, there's usually a fork resolution rule, e.g. D: { "prev_hash": "<hash of B>", "content": "foobazbar" }

to resolve C vs D, sort lexicographically, choose direction of sort order and pick first

When you have non-continuous data due to messages dropping, e.g. you have B and perhaps an E that builds on C, you can either use the same lexicographic rule, or make the hash basis a combination of timestamp and hash, so you get temporality and lineage.

As for deletes, you have either the single set approach of simply making the message content empty and that _is_ the delete, or you have the 2-phase sets, where there exists an add set and a delete set.

Quite a few ways to approach it, but commutativity can be readily preserved.

sanity 5 hours ago | parent | prev [-]

It depends on the nature of the data, for example in our group chat app River[1] it stores the most recent messages - deleting the oldest to make room for the newest as necessary. Banning a user will remove them from the members list along with their messages, and recently banned users are stored in a banned list (like a tombstone).

So there is no one approach to this, rather you design the approach based on the application, and since contracts are just webassembly they are extremely flexible.

[1] https://github.com/freenet/river