| ▲ | fauigerzigerk 5 days ago |
| There are two rather different issues. One large class of problems I'm thinking of is simply outside the scope of CRDTs. The whole idea of _eventual_ consistency doesn't really work for things like payment systems or booking systems. A lot of OLTP applications have to be consistent at all times (hence the O). Money must not be double spent. Rooms or seats must not be double booked. The other class of problems is more debatable. CRDTs can guarantee that collaborative text editing results in the same sequence of letters on all nodes. They cannot guarantee that this sequence makes sense. Authors can step on each other's toes. Whether or not this is a problem depends on the specific workflow and I think it could be mitigated by choosing better units of storage/work (such as paragraphs rather than letters). |
|
| ▲ | josephg 5 days ago | parent | next [-] |
| > One large class of problems I'm thinking of is simply outside the scope of CRDTs. The whole idea of _eventual_ consistency doesn't really work for things like payment systems or booking systems. Yes! I think of it as owned data and shared data. Owned data is data that is owned by one process or node. Eg my bank balance, the position of my mouse cursor, the temperature of my CPU. For this stuff, you don’t want a crdt. Use a database. Or a variable in memory or a file on disk. Broadcast updates if you want, but route all write requests through the data’s owner. Then there’s shared data - like the source code for a project or an apple note. There, CRDTs might make sense - especially if you get branching and merging support along for the ride. > Authors can step on each other's toes. Yeah when merging long lived branches, the workflow most people want is what git provides - of humans manually resolving conflicts. There’s no reason a crdt couldn’t provide this. CRDTs have a superset of the information available to git. It’s weird nobody has coded a system like that up yet. |
| |
| ▲ | withinboredom 5 days ago | parent | next [-] | | I think you have the right idea, but possibly the wrong perspective. You want your _source of truth_, which is the "owned data" to be strongly consistent. Your shared data is a "view of truth" which may be incomplete or in disagreement with the source of truth. For example, the color of the sky "right now" depends on where on the earth you are standing, but we can all agree that air is 'just barely blue' and it depends on the light shining into it and how much of there exists. The _source of truth_ are these facts (like "the air is blue" or "the user inserted the letter A at position X" or "the CPU is 40 degrees"). The view of this source is what we see, and can be seen through a CRDT or any other lens. | | |
| ▲ | josephg 5 days ago | parent [-] | | The way I’m defining it, my shared state is the data we store in a crdt. And CRDTs have strong eventual consistency. That’s what makes them great. So we can have a data structure which shows all users an identical view of the world. Normally we do that by storing something totally different under the hood. Eg, git actually stores a commit graph. But the system makes a determinism guarantee: we promise that all users who have the same version checked out will see exactly the same thing. At one level, we’re storing “a list of facts” (the commit graph). But at another level of abstraction, we’re just storing application data. It’s just also replicated between many peers. And editable locally without network access. | | |
| ▲ | withinboredom 4 days ago | parent [-] | | > So we can have a data structure which shows all users an identical view of the world. This is never true. You can prove that at some time now()-T where T > 0 you had the same view of the universe, but you cannot prove that you currently have the exact same view because even with the attempt of checking, T becomes greater than 0. Sometimes, this doesn't matter (T can be arbitrarily large and still effectively be zero -- like asking your friend if he is still married to that person. They can answer you days later, and it'll still be true), but sometimes even very small values of T cannot be assumed to be zero. | | |
| ▲ | josephg 4 days ago | parent [-] | | Well yeah obviously you never know for sure that a remote peer doesn’t have some changes that they haven’t told you about yet. That’s also true with lots of platforms - like google docs and Notion and multiplayer video games. Seems fine though? I don’t understand why this matters for collaborative editing? | | |
| ▲ | withinboredom 4 days ago | parent [-] | | Have you ever worked on the same repo with >500 devs? 99% of the time, it doesn’t matter. People talk to people. | | |
| ▲ | josephg 3 days ago | parent [-] | | Yes; but I have no idea how that connects to anything else we’ve been discussing here. |
|
|
|
|
| |
| ▲ | sethev 5 days ago | parent | prev | next [-] | | Pijul is a version control system based on a CRDT: https://pijul.org/manual/theory.html#conflicts-and-crdts It works like you describe, with humans manually resolving conflicts. The conflicts are represented in the data model, so the data model itself converges without conflicts...if that makes sense. | |
| ▲ | johnecheck 5 days ago | parent | prev | next [-] | | Conflict-free is right in the name, layering conflicts on top of it would be blasphemy :p | |
| ▲ | evelant 5 days ago | parent | prev | next [-] | | See my comment below, I prototyped something like this. https://news.ycombinator.com/item?id=45180325 | | |
| ▲ | josephg 5 days ago | parent [-] | | Interesting idea. As I understand it though, this wouldn’t give you the kind of conflict semantics I’m talking about out of the box. What I want is - if two users concurrently edit the same line of text, the system can “merge” those changes by storing the conflict. Subsequent readers of the document see a merge conflict and can resolve the conflict manually. Your system looks like it just enforces a global order on the actions. This will give you SEC - but how do you preserve the information that these edits were concurrent - and thus conflict with one another? | | |
| ▲ | evelant 4 days ago | parent [-] | | You're right, it's not the same as conflict/merge semantics, but you probably could implement those semantics on top of it. My idea was more about being able to merge offline states for arbitrary data without user intervention while also ensuring that application invariants / semantics are preserved. Preserving app semantics while as much as possible preserving user intentions. |
|
| |
| ▲ | fauigerzigerk 5 days ago | parent | prev [-] | | >CRDTs have a superset of the information available to git. It’s weird nobody has coded a system like that up yet. That's an interesting idea. I have to think about this. |
|
|
| ▲ | gritzko 5 days ago | parent | prev [-] |
| The classical paper-ledger bookkeeping is pretty much eventually consistent. They did not have the Internet when they invented it. Flight booking is often statistically consistent only. Overbooking, etc. |
| |
| ▲ | fauigerzigerk 5 days ago | parent [-] | | >The classical paper-ledger bookkeeping is pretty much eventually consistent. They did not have the Internet when they invented it. Absolutely. Bookkeeping is an offline activity (I'm only doing it once a year in my company, ha ha). You just have to make sure not to record the same transaction more than once, which could be non-trivial but shouldn't be impossible to do with CRDTs. >Flight booking is often statistically consistent only. Overbooking, etc. That may be acceptable in some cases but you still can't use CRDTs for it, because you need a way to limit the extent of overbooking. That requires a centralised count of bookings. | | |
| ▲ | josephg 5 days ago | parent [-] | | Most complex crdts are built on top of the simple crdt of a grow only set. Ie, what we actually synchronise over the network is a big bag of commits / operations / something such that the network protocol makes sure everyone ends up with all of the operations known to any peer. Then the crdt takes that big set and produces some sort of sensible projection from it. > You just have to make sure not to record the same transaction more than once So this should be pretty easy. Have a grow only set of transactions. Give each one a globally unique ID at the point of creation. Order by date and do bookkeeping. One thing you can’t guarantee is that the balance is always positive. But otherwise - yeah. |
|
|