Remix.run Logo
Aurornis 4 days ago

Almost every time I see CRDTs mentioned it’s used as a magic device that makes conflicts disappear. The details, of course, are not mentioned.

Technically an algorithm that lets the last writer win is a CRDT because there is no conflict.

Making a true system that automatically merges data while respecting user intent and expectations can be an extremely hard problem for anything complex like text.

Another problem is that in some situations using a CRDT to make the conflict disappear isn’t even the right approach. If two users schedule a meeting for the same open meeting room, you can’t avoid issues with an algorithm. You need to let them know about the conflict so they can resolve the issue.

josephg 3 days ago | parent | next [-]

> Technically an algorithm that lets the last writer win is a CRDT because there is no conflict.

Yes. It’s called a LWW register in the literature. Usually a MV (multi value) register is more useful. When there is a conflict, MV registers store all conflicting values and make any subsequent reader figure out what to do.

But for my money, usually what you want is something operation based. Like, if we both append to a transaction list, the list should end up with both items. Operation based CRDTs can handle any semantics you choose - so you can mix and match different merging approaches based on the application. And the storage for LWW and MV is the same, so you can start simple and grow your app as needed.

IMO the reason local first software is still not popular is the same reason encrypted messaging apps took awhile. There’s a lag from good CS research enabling a class of applications and good implementations being available with good UX. Most CRDT implementations are still pretty new. Automerge has been really slow until quite recently. Most libraries don’t support ephemeral data or binary blobs well. And there aren’t a lot of well defined patterns around user login and authentication. Local first apps today have a lot of work ahead of them to just exist at all. Give it some time.

LanceH 4 days ago | parent | prev | next [-]

The details appear to be not giving a damn about changes to data. I personally wouldn't describe that as "converging". As it says, this is "Last Write Wins", which is great if the last writer is always correct.

"If it’s older → ignore" -- yea, I guess that's a solution but I would really have to look for the problem.

I've gone down this road and considered github (or similar) as the backing database. In the end, I just don't have an answer that isn't specific to nearly every field that might be entered. A notes field might be appended. First in might be important, or last in (if it can be trusted as correct). Usually it's something like, "which of these two did you meant to capture?"

mcv 3 days ago | parent [-]

Funny thing is that the article gives an example where "last write wins" is quite clearly a bad solution.

Balance = 100 A: balance = 120 B: balance = 80

Clearly, these are transactions, and it doesn't matter in which order they're applied, but it does matter that they're both executed. End balance should be 100, not 80 or 120.

LanceH 3 days ago | parent [-]

I was thinking about this overnight and maybe my beef is the article feels like it's written as a solution to "offline", when really it's a much narrower solution.

This solution doesn't nearly move us toward making local-first apps more popular, which was nominally the theme.

rogerrogerr 4 days ago | parent | prev | next [-]

I’ve never really thought about this - how does Outlook handle this? Has anyone received a “sorry, that room you reserved actually wasn’t available; talk to this other dude who reserved it too” message after reserving a meeting room?

Or does it just double book the room? Or is there a global lock on any transaction affecting a meeting room, so only one goes through? (Feels like it doesn’t scale)

appreciatorBus 4 days ago | parent | next [-]

In Google Workspace, rooms are resources with calendars that can be configured to auto accept any invitation unless they’ve already booked. So it’s basically first come first serve. Even if two people are literally trying to book the room, at the same time, simultaneously, one request will go through first and will be accepted and the second will be declined. I imagine outlook is similar.

yccs27 4 days ago | parent [-]

In other words, Google sacrifices availability/latency here - they don't accept the request until they can be sure it's still available.

coldtea 3 days ago | parent [-]

They can accept the request (accept as in receive for processing).

They just can't send the acknowledgement of "succesfully booked" yet.

account42 3 days ago | parent | prev | next [-]

> Or is there a global lock on any transaction affecting a meeting room, so only one goes through? (Feels like it doesn’t scale)

Why wouldn't it scale? How many meetings are booked per second in your organization???

throwaway4226 3 days ago | parent | next [-]

I think the potential for abuse is high. With a locking system, someone could (and probably would) click (manually or with a script) on a time slot to "reserve" a room just in case they needed it.

jon-wood 3 days ago | parent | next [-]

These are physical meeting rooms within a company. The resolution to this sort of abuse doesn't need to be automated, first it's a person in the facilities team having a quiet chat with the person doing that and asking them not to, eventually it gets escalated through various managers until it's a very final chat with HR before being asked to leave the building and not come back.

coldtea 3 days ago | parent | prev [-]

And if they did it a lot, they're scolded or fired.

That's not a real problem - at least not in the "book a corporate meeting room" space.

kirici 3 days ago | parent | prev [-]

Clearly you fell for the premature measuring fallacy, everyone knows to optimize for web-scale first.

bmm6o 4 days ago | parent | prev | next [-]

Exchange server accepts or rejects meeting requests. There's no offline room reservation so it's pretty simple.

bootsmann 3 days ago | parent [-]

Presumably exchange server is not a single node?

immibis 3 days ago | parent [-]

Then it does whatever is needed to make it safe. For example, it might use a hash ring to assign each meeting room to a single node, and that node processes one request at a time. Most distributed systems are like this.

A traditional database funnels all your data changes down to one leader node which then accepts or rejects them, and (if A+C in the case of single node failure is desired) makes sure the data is replicated to follower nodes before accepting.

A distributed database is similar but different pieces of data can be on different leaders and different follower sets.

This comment was rate-limited.

Aurornis 3 days ago | parent | prev | next [-]

> I’ve never really thought about this - how does Outlook handle this?

Simple: It’s server based. These problems are trivial when the server is coordinating responses and the clients can only reserve a room if the server confirms it.

This is the problem that gets hand waved away with local first software that has multi user components: It doesn’t take long before two users do something locally that conflicts. Then what do you do? You have to either force a conflict resolution and get the users to resolve it, or you start doing things like discarding one of the changes so the other wins.

coldtea 3 days ago | parent | prev [-]

It doesn't scale universely, but it doesn't need to: it only needs to cover a specific company/organization/department. So it's trivial to work at that scale.

Hell, it's so feasible, it can even done manually IRL by some person (like discussions where a person holds the "talking stick" and only there are allowed to speak until they pass it to another person - that's a lock).

tylerchilds 4 days ago | parent | prev | next [-]

You can resolve it with an algorithm, like so

- prefer seniority - prefer pay scale - prefer alphabetical - roll dice

That’s how a business would probably do it since the first two alone align with how the business already values their Human Resources, which would translate to “the objects that the Human Resources compete for”

arccy 3 days ago | parent [-]

and the interns get the blame for what they can't book why rooms, but for the people managing them it's just so easy.

tylerchilds 3 days ago | parent [-]

Incorrect.

In a well designed system, the intern will be delegated with “room booking authority” on behalf of their most senior manager on the calendar invite.

Using something like this, that would be in the CRDT resolution algorithm.

https://w3c-ccg.github.io/zcap-spec/

Company culture will recognize it is an HR problem.

motorest 4 days ago | parent | prev [-]

> Technically an algorithm that lets the last writer win is a CRDT because there is no conflict.

Your comment shows some ignorance and a complete misunderstanding of the problem domain.

The whole point of CRDTs is that the set o operations supported is designed to ensure that conflict handling is consistent and deterministic across nodes,and the state of all nodes involved automatically converge to the same state.

Last-write-wins strategies offer no such guarantees. Your state diverges uncontrollably and your system will become inconsistent at the first write.

> Making a true system that automatically merges data while respecting user intent and expectations can be an extremely hard problem for anything complex like text.

Again, this shows a complete misunderstanding of the problem domain. CRDTs ensure state converges across nodes, but they absolutely do not reflect "user intent". They just handle merges consistently. User intent is reflected by users applying their changes, which the system then propagates consistently across nodes.

The whole point of CRDTs is state convergence and consistency.

zovirl 4 days ago | parent | next [-]

I think the parent was complaining about mentions of CRDTs which don’t acknowledge that the problem domain CRDTs work in is very low level, and don’t mention how much additional effort is needed to make merging work in a way that’s useful for users.

This article is a perfect example: it says syncing is a challenge for local-first apps, logical clocks and CRDTs are the solution, and then just ends. It ignores the elephant in the room: CRDTs get you consistency, but consistency isn’t enough.

Take a local-first text editor, for example: a CRDT ensures all nodes eventually converge on the same text, but doesn’t guarantee the meaning or structure is preserved. Maybe the text was valid English, or JSON, or alphabetized, but after the merge, it may not be.

My suspicion, and I might be going out on a limb here, is that articles don’t talk about this because there is no good solution to merging for offline or local-first apps. My reasoning is that if there was a good solution, git would adopt it. The fact that git stills makes me resolve merge conflicts manually makes me think no one has found a better way.

account42 3 days ago | parent [-]

There is definitely no general solution but for some domains there may be acceptable solutions.

Git is a good example though as we can definitely write merge algorithms that get good results in many more cases than git's default but with something like code it's preferable to let the human user decide what is the correct merge solution except trivial cases. Still, a language aware merge algorithm could do a lot better than git in both automatically merging more cases and refusing to merge nonsensical combinations of commits that don't touch the same lines.

preommr 4 days ago | parent | prev | next [-]

Not the parent comment, but I'll respond.

> Your comment shows some ignorance and a complete misunderstanding of the problem domain.

oof, this is a very strong position to take, and one would assume you have a very convincing follow up to back it up.

And unfortunately I don't think you do. CRDTs can definitely be implemented as a last-write implementaiton. This should be obvious for state-based crdts. The problem is that it's a horrible UX because somebody could type a response to something that's out of date, and then just see it dissapear as they get the most recent message.

Resolving "user intent" by choosing how to structure the problem domain (e.g. storing ids for lines, and having custom merging solutions) so that it reflects what the user is trying to do is the main challenge.

I am quite frankly baffled at how arrongant your tone is given how far off the mark you seem to be. Genuinely makes me think that I am missing something given your confidence, but I don't see what point you're making tbh.

coldtea 3 days ago | parent | prev [-]

>Your comment shows some ignorance and a complete misunderstanding of the problem domain

Imagine how better your comment would be if you ommited the above line, which adds nothing to the correction you try to make, but comes off as stand-offish.