Remix.run Logo
josephg 5 days ago

> One large class of problems I'm thinking of is simply outside the scope of CRDTs. The whole idea of _eventual_ consistency doesn't really work for things like payment systems or booking systems.

Yes! I think of it as owned data and shared data. Owned data is data that is owned by one process or node. Eg my bank balance, the position of my mouse cursor, the temperature of my CPU. For this stuff, you don’t want a crdt. Use a database. Or a variable in memory or a file on disk. Broadcast updates if you want, but route all write requests through the data’s owner.

Then there’s shared data - like the source code for a project or an apple note. There, CRDTs might make sense - especially if you get branching and merging support along for the ride.

> Authors can step on each other's toes.

Yeah when merging long lived branches, the workflow most people want is what git provides - of humans manually resolving conflicts. There’s no reason a crdt couldn’t provide this. CRDTs have a superset of the information available to git. It’s weird nobody has coded a system like that up yet.

withinboredom 5 days ago | parent | next [-]

I think you have the right idea, but possibly the wrong perspective. You want your _source of truth_, which is the "owned data" to be strongly consistent. Your shared data is a "view of truth" which may be incomplete or in disagreement with the source of truth. For example, the color of the sky "right now" depends on where on the earth you are standing, but we can all agree that air is 'just barely blue' and it depends on the light shining into it and how much of there exists.

The _source of truth_ are these facts (like "the air is blue" or "the user inserted the letter A at position X" or "the CPU is 40 degrees"). The view of this source is what we see, and can be seen through a CRDT or any other lens.

josephg 5 days ago | parent [-]

The way I’m defining it, my shared state is the data we store in a crdt. And CRDTs have strong eventual consistency. That’s what makes them great. So we can have a data structure which shows all users an identical view of the world.

Normally we do that by storing something totally different under the hood. Eg, git actually stores a commit graph. But the system makes a determinism guarantee: we promise that all users who have the same version checked out will see exactly the same thing. At one level, we’re storing “a list of facts” (the commit graph). But at another level of abstraction, we’re just storing application data. It’s just also replicated between many peers. And editable locally without network access.

withinboredom 4 days ago | parent [-]

> So we can have a data structure which shows all users an identical view of the world.

This is never true. You can prove that at some time now()-T where T > 0 you had the same view of the universe, but you cannot prove that you currently have the exact same view because even with the attempt of checking, T becomes greater than 0. Sometimes, this doesn't matter (T can be arbitrarily large and still effectively be zero -- like asking your friend if he is still married to that person. They can answer you days later, and it'll still be true), but sometimes even very small values of T cannot be assumed to be zero.

josephg 4 days ago | parent [-]

Well yeah obviously you never know for sure that a remote peer doesn’t have some changes that they haven’t told you about yet. That’s also true with lots of platforms - like google docs and Notion and multiplayer video games. Seems fine though? I don’t understand why this matters for collaborative editing?

withinboredom 4 days ago | parent [-]

Have you ever worked on the same repo with >500 devs? 99% of the time, it doesn’t matter. People talk to people.

josephg 3 days ago | parent [-]

Yes; but I have no idea how that connects to anything else we’ve been discussing here.

sethev 5 days ago | parent | prev | next [-]

Pijul is a version control system based on a CRDT: https://pijul.org/manual/theory.html#conflicts-and-crdts

It works like you describe, with humans manually resolving conflicts. The conflicts are represented in the data model, so the data model itself converges without conflicts...if that makes sense.

johnecheck 5 days ago | parent | prev | next [-]

Conflict-free is right in the name, layering conflicts on top of it would be blasphemy :p

evelant 5 days ago | parent | prev | next [-]

See my comment below, I prototyped something like this. https://news.ycombinator.com/item?id=45180325

josephg 5 days ago | parent [-]

Interesting idea. As I understand it though, this wouldn’t give you the kind of conflict semantics I’m talking about out of the box. What I want is - if two users concurrently edit the same line of text, the system can “merge” those changes by storing the conflict. Subsequent readers of the document see a merge conflict and can resolve the conflict manually.

Your system looks like it just enforces a global order on the actions. This will give you SEC - but how do you preserve the information that these edits were concurrent - and thus conflict with one another?

evelant 4 days ago | parent [-]

You're right, it's not the same as conflict/merge semantics, but you probably could implement those semantics on top of it. My idea was more about being able to merge offline states for arbitrary data without user intervention while also ensuring that application invariants / semantics are preserved. Preserving app semantics while as much as possible preserving user intentions.

fauigerzigerk 5 days ago | parent | prev [-]

>CRDTs have a superset of the information available to git. It’s weird nobody has coded a system like that up yet.

That's an interesting idea. I have to think about this.