| ▲ | crazygringo 4 days ago |
| > The Solution: CRDTs. The right approach is CRDTs (Conflict-Free Replicated Data Types)... This means you can apply messages in any order, even multiple times, and every device will still converge to the same state. This is very much "draw the rest of the owl". Creating a CRDT model for your data that matches intuitive user expectations and obeys consistent business logic is... not for the faint of heart. Also remember it turns your data model into a bunch of messages that then need to be constantly reconstructed into the actual current state of data. It's a gigantic enormous headache. |
|
| ▲ | Aurornis 4 days ago | parent | next [-] |
| Almost every time I see CRDTs mentioned it’s used as a magic device that makes conflicts disappear. The details, of course, are not mentioned. Technically an algorithm that lets the last writer win is a CRDT because there is no conflict. Making a true system that automatically merges data while respecting user intent and expectations can be an extremely hard problem for anything complex like text. Another problem is that in some situations using a CRDT to make the conflict disappear isn’t even the right approach. If two users schedule a meeting for the same open meeting room, you can’t avoid issues with an algorithm. You need to let them know about the conflict so they can resolve the issue. |
| |
| ▲ | josephg 3 days ago | parent | next [-] | | > Technically an algorithm that lets the last writer win is a CRDT because there is no conflict. Yes. It’s called a LWW register in the literature. Usually a MV (multi value) register is more useful. When there is a conflict, MV registers store all conflicting values and make any subsequent reader figure out what to do. But for my money, usually what you want is something operation based. Like, if we both append to a transaction list, the list should end up with both items. Operation based CRDTs can handle any semantics you choose - so you can mix and match different merging approaches based on the application. And the storage for LWW and MV is the same, so you can start simple and grow your app as needed. IMO the reason local first software is still not popular is the same reason encrypted messaging apps took awhile. There’s a lag from good CS research enabling a class of applications and good implementations being available with good UX. Most CRDT implementations are still pretty new. Automerge has been really slow until quite recently. Most libraries don’t support ephemeral data or binary blobs well. And there aren’t a lot of well defined patterns around user login and authentication. Local first apps today have a lot of work ahead of them to just exist at all. Give it some time. | |
| ▲ | LanceH 4 days ago | parent | prev | next [-] | | The details appear to be not giving a damn about changes to data. I personally wouldn't describe that as "converging". As it says, this is "Last Write Wins", which is great if the last writer is always correct. "If it’s older → ignore" -- yea, I guess that's a solution but I would really have to look for the problem. I've gone down this road and considered github (or similar) as the backing database. In the end, I just don't have an answer that isn't specific to nearly every field that might be entered. A notes field might be appended. First in might be important, or last in (if it can be trusted as correct). Usually it's something like, "which of these two did you meant to capture?" | | |
| ▲ | mcv 3 days ago | parent [-] | | Funny thing is that the article gives an example where "last write wins" is quite clearly a bad solution. Balance = 100
A: balance = 120
B: balance = 80 Clearly, these are transactions, and it doesn't matter in which order they're applied, but it does matter that they're both executed. End balance should be 100, not 80 or 120. | | |
| ▲ | LanceH 3 days ago | parent [-] | | I was thinking about this overnight and maybe my beef is the article feels like it's written as a solution to "offline", when really it's a much narrower solution. This solution doesn't nearly move us toward making local-first apps more popular, which was nominally the theme. |
|
| |
| ▲ | rogerrogerr 4 days ago | parent | prev | next [-] | | I’ve never really thought about this - how does Outlook handle this? Has anyone received a “sorry, that room you reserved actually wasn’t available; talk to this other dude who reserved it too” message after reserving a meeting room? Or does it just double book the room? Or is there a global lock on any transaction affecting a meeting room, so only one goes through? (Feels like it doesn’t scale) | | |
| ▲ | appreciatorBus 4 days ago | parent | next [-] | | In Google Workspace, rooms are resources with calendars that can be configured to auto accept any invitation unless they’ve already booked. So it’s basically first come first serve. Even if two people are literally trying to book the room, at the same time, simultaneously, one request will go through first and will be accepted and the second will be declined. I imagine outlook is similar. | | |
| ▲ | yccs27 4 days ago | parent [-] | | In other words, Google sacrifices availability/latency here - they don't accept the request until they can be sure it's still available. | | |
| ▲ | coldtea 3 days ago | parent [-] | | They can accept the request (accept as in receive for processing). They just can't send the acknowledgement of "succesfully booked" yet. |
|
| |
| ▲ | account42 3 days ago | parent | prev | next [-] | | > Or is there a global lock on any transaction affecting a meeting room, so only one goes through? (Feels like it doesn’t scale) Why wouldn't it scale? How many meetings are booked per second in your organization??? | | |
| ▲ | throwaway4226 3 days ago | parent | next [-] | | I think the potential for abuse is high. With a locking system, someone could (and probably would) click (manually or with a script) on a time slot to "reserve" a room just in case they needed it. | | |
| ▲ | jon-wood 3 days ago | parent | next [-] | | These are physical meeting rooms within a company. The resolution to this sort of abuse doesn't need to be automated, first it's a person in the facilities team having a quiet chat with the person doing that and asking them not to, eventually it gets escalated through various managers until it's a very final chat with HR before being asked to leave the building and not come back. | |
| ▲ | coldtea 3 days ago | parent | prev [-] | | And if they did it a lot, they're scolded or fired. That's not a real problem - at least not in the "book a corporate meeting room" space. |
| |
| ▲ | kirici 3 days ago | parent | prev [-] | | Clearly you fell for the premature measuring fallacy, everyone knows to optimize for web-scale first. |
| |
| ▲ | bmm6o 4 days ago | parent | prev | next [-] | | Exchange server accepts or rejects meeting requests. There's no offline room reservation so it's pretty simple. | | |
| ▲ | bootsmann 3 days ago | parent [-] | | Presumably exchange server is not a single node? | | |
| ▲ | immibis 3 days ago | parent [-] | | Then it does whatever is needed to make it safe. For example, it might use a hash ring to assign each meeting room to a single node, and that node processes one request at a time. Most distributed systems are like this. A traditional database funnels all your data changes down to one leader node which then accepts or rejects them, and (if A+C in the case of single node failure is desired) makes sure the data is replicated to follower nodes before accepting. A distributed database is similar but different pieces of data can be on different leaders and different follower sets. This comment was rate-limited. |
|
| |
| ▲ | Aurornis 3 days ago | parent | prev | next [-] | | > I’ve never really thought about this - how does Outlook handle this? Simple: It’s server based. These problems are trivial when the server is coordinating responses and the clients can only reserve a room if the server confirms it. This is the problem that gets hand waved away with local first software that has multi user components: It doesn’t take long before two users do something locally that conflicts. Then what do you do? You have to either force a conflict resolution and get the users to resolve it, or you start doing things like discarding one of the changes so the other wins. | |
| ▲ | coldtea 3 days ago | parent | prev [-] | | It doesn't scale universely, but it doesn't need to: it only needs to cover a specific company/organization/department. So it's trivial to work at that scale. Hell, it's so feasible, it can even done manually IRL by some person (like discussions where a person holds the "talking stick" and only there are allowed to speak until they pass it to another person - that's a lock). |
| |
| ▲ | tylerchilds 4 days ago | parent | prev | next [-] | | You can resolve it with an algorithm, like so - prefer seniority
- prefer pay scale
- prefer alphabetical
- roll dice That’s how a business would probably do it since the first two alone align with how the business already values their Human Resources, which would translate to “the objects that the Human Resources compete for” | | |
| ▲ | arccy 3 days ago | parent [-] | | and the interns get the blame for what they can't book why rooms, but for the people managing them it's just so easy. | | |
| ▲ | tylerchilds 3 days ago | parent [-] | | Incorrect. In a well designed system, the intern will be delegated with “room booking authority” on behalf of their most senior manager on the calendar invite. Using something like this, that would be in the CRDT resolution algorithm. https://w3c-ccg.github.io/zcap-spec/ Company culture will recognize it is an HR problem. |
|
| |
| ▲ | motorest 4 days ago | parent | prev [-] | | > Technically an algorithm that lets the last writer win is a CRDT because there is no conflict. Your comment shows some ignorance and a complete misunderstanding of the problem domain. The whole point of CRDTs is that the set o operations supported is designed to ensure that conflict handling is consistent and deterministic across nodes,and the state of all nodes involved automatically converge to the same state. Last-write-wins strategies offer no such guarantees. Your state diverges uncontrollably and your system will become inconsistent at the first write. > Making a true system that automatically merges data while respecting user intent and expectations can be an extremely hard problem for anything complex like text. Again, this shows a complete misunderstanding of the problem domain. CRDTs ensure state converges across nodes, but they absolutely do not reflect "user intent". They just handle merges consistently. User intent is reflected by users applying their changes, which the system then propagates consistently across nodes. The whole point of CRDTs is state convergence and consistency. | | |
| ▲ | zovirl 4 days ago | parent | next [-] | | I think the parent was complaining about mentions of CRDTs which don’t acknowledge that the problem domain CRDTs work in is very low level, and don’t mention how much additional effort is needed to make merging work in a way that’s useful for users. This article is a perfect example: it says syncing is a challenge for local-first apps, logical clocks and CRDTs are the solution, and then just ends. It ignores the elephant in the room: CRDTs get you consistency, but consistency isn’t enough. Take a local-first text editor, for example: a CRDT ensures all nodes eventually converge on the same text, but doesn’t guarantee the meaning or structure is preserved. Maybe the text was valid English, or JSON, or alphabetized, but after the merge, it may not be. My suspicion, and I might be going out on a limb here, is that articles don’t talk about this because there is no good solution to merging for offline or local-first apps. My reasoning is that if there was a good solution, git would adopt it. The fact that git stills makes me resolve merge conflicts manually makes me think no one has found a better way. | | |
| ▲ | account42 3 days ago | parent [-] | | There is definitely no general solution but for some domains there may be acceptable solutions. Git is a good example though as we can definitely write merge algorithms that get good results in many more cases than git's default but with something like code it's preferable to let the human user decide what is the correct merge solution except trivial cases. Still, a language aware merge algorithm could do a lot better than git in both automatically merging more cases and refusing to merge nonsensical combinations of commits that don't touch the same lines. |
| |
| ▲ | preommr 4 days ago | parent | prev | next [-] | | Not the parent comment, but I'll respond. > Your comment shows some ignorance and a complete misunderstanding of the problem domain. oof, this is a very strong position to take, and one would assume you have a very convincing follow up to back it up. And unfortunately I don't think you do. CRDTs can definitely be implemented as a last-write implementaiton. This should be obvious for state-based crdts. The problem is that it's a horrible UX because somebody could type a response to something that's out of date, and then just see it dissapear as they get the most recent message. Resolving "user intent" by choosing how to structure the problem domain (e.g. storing ids for lines, and having custom merging solutions) so that it reflects what the user is trying to do is the main challenge. I am quite frankly baffled at how arrongant your tone is given how far off the mark you seem to be. Genuinely makes me think that I am missing something given your confidence, but I don't see what point you're making tbh. | |
| ▲ | coldtea 3 days ago | parent | prev [-] | | >Your comment shows some ignorance and a complete misunderstanding of the problem domain Imagine how better your comment would be if you ommited the above line, which adds nothing to the correction you try to make, but comes off as stand-offish. |
|
|
|
| ▲ | teleforce 4 days ago | parent | prev | next [-] |
| There is next gen web standards initiative namely BRAID that will make web to be more human and machine friendly with a synchronous web of state [1],[2],[3]. "Braid’s goal is to extend HTTP from a state transfer protocol to a state sync protocol, in order to do away with custom sync protocols and make state across the web more interoperable. Braid puts the power of operational transforms and CRDTs on the web, improving network performance and enabling natively p2p, collaboratively-editable, local-first web applications." [4] [1] A Synchronous Web of State: https://braid.org/meeting-107 [2] Braid: Synchronization for HTTP (88 comments): https://news.ycombinator.com/item?id=40480016 [3] Most RESTful APIs aren't really RESTful (564 comments): https://news.ycombinator.com/item?id=44507076 [4] Braid HTTP: https://jzhao.xyz/thoughts/Braid-HTTP |
|
| ▲ | cyberax 4 days ago | parent | prev | next [-] |
| We have a local-first app. Our approach? Just ignore the conflicts. The last change wins. No, really. In practice for most cases the conflicts are either trivial, or impossible. Trivial conflicts like two people modifying the same note are trivial for users, once you have a simple audit log. And impossible conflicts are impossible to solve automatically anyway and require business processes around them. Example: two people starting to work on the same task in an offline-enabled task tracker. |
| |
| ▲ | moggers123 4 days ago | parent | next [-] | | >Example: two people starting to work on the same task in an offline-enabled task tracker.
Wouldn't this just mean both people are working on it? I agree that this means humans intervening.. It sounds like there was a comms breakdown. But rather than doing a first-in-best-dressed, it sounds like accurately recording that both users are in fact working on the same thing is the best option since it surfaces that intervention is required (or maybe its intentional, tools insisting that only one person can work on an item at once annoys me). Sounds much better than quietly blowing away one of the user's changes. In principle, local-first to me means each instance (and the actions each user carries out on their instance) is sacrosanct. Server's job is to collate it, not decide what the Truth is (by first-in-best-dressed or otherwise). | | |
| ▲ | cyberax 4 days ago | parent [-] | | Sure. But then you need to notify users when they come back online that there's a conflict, so they can resolve what to do. You likely need to have a report on the frequency of such occasions for the managers, and so on. These kinds of conflicts simply can not be solved by CRDTs or any other automated process. The application has to be designed around that. > In principle, local-first to me means each instance (and the actions each user carries out on their instance) is sacrosanct. Server's job is to collate it, not decide what the Truth is (by first-in-best-dressed or otherwise). This makes sense only for some applications, though. And we have not yet started talking about permissions, access control, and other nice fun things. | | |
| ▲ | evelant 3 days ago | parent | next [-] | | I’ve been experimenting with this, it’s a very interesting problem space! https://github.com/evelant/synchrotron Idea is to sync business logic calls instead of state. Let business logic resolve all conflicts client side. Logical clocks give consistent ordering. RLS gives permissions and access control. No dedicated conflict resolution logic necessary but still guarantees semantic consistency and maximally preserves user intentions. That’s the idea at least, requires more thought and hacking. | |
| ▲ | moggers123 3 days ago | parent | prev [-] | | I doubt you'll ever see this.. Oh well.. I probably should have been explicit in that I'm not arguing in favor of CRDTs, just that the adverse doesn't need to be "send it and accept the collateral". Draw The Rest Of The Owl energy here, but at least its a nice northern star. |
|
| |
| ▲ | giancarlostoro 3 days ago | parent | prev | next [-] | | Wouldn't it be less of an issue if you track the change history, and let users pick a specific historical version? Then it doesn't matter who wins, the end-user can go in and change it. Version control is one of the best parts about Google Docs. | |
| ▲ | kobieps 3 days ago | parent | prev | next [-] | | Who is the audience of your app? Is it an internal app for a company, or is it a public facing consumer app? | | |
| ▲ | cyberax 3 days ago | parent [-] | | Public app used by professionals in the field, often with poor or no connectivity. Even having a local read-only copy of data is often helpful for them. | | |
| ▲ | kobieps 3 days ago | parent [-] | | Cool. Yeah in my experience last-write-wins is sufficient for 95% of use cases, and if you add audit trails to help resolve any disputes it gets you to 98% |
|
| |
| ▲ | sakesun 4 days ago | parent | prev | next [-] | | Just have audit log. No need to try solving every trivial cases. Make something useful. | |
| ▲ | kazinator 4 days ago | parent | prev [-] | | One solution is to make it so that people see their literal keystrokes in real time. Then they solve the conflict themselves. Like, "stop typing into this text because bob is typing into it". It's like Ethernet conflict resolution: just access the shared medium and detect collisions in real time. | | |
| ▲ | avemg 4 days ago | parent | next [-] | | How will you know that Bob is typing into it if you're offline? | | |
| ▲ | kazinator 4 days ago | parent [-] | | That's a fair question; we here being under a submission aout local-first apps, and al. Of course, you know the answer: if you're offline, you're not online. Bob gets to type whatever Bob wants, and until you go online, you don't get to overtype anything. | | |
| ▲ | ongy 4 days ago | parent [-] | | But the offline enabled property allows exactly that. Both sides type offline and only sync later.
Neither would like their change to just be discarded. | | |
| ▲ | kazinator 3 days ago | parent [-] | | I was responding only to the idea of having no conflict resolution: last edit wins (proposedin a great grandparent comment): https://news.ycombinator.com/item?id=45341335 "We have a local-first app. Our approach? Just ignore the conflicts. The last change wins." if you can see the edits being made in real time, keystroke by keystroke, that pretty much solves that problem. As for offline editing, either don't support it (then you're not local-anything obviously) or you can have some lame workflow like "the document was changed by another user ..." |
|
|
| |
| ▲ | cyberax 4 days ago | parent | prev [-] | | It's fine if you're talking about a text editor or an Excel table. And it's one of the few cases where CRDTs make sense. If you have a CRM-like application with a list of users? Not so much. |
|
|
|
| ▲ | jwr 3 days ago | parent | prev | next [-] |
| The author also assumes that users are rational, make no mistakes, and there exists a logical non-conflicting ordering of their updates that makes sense. This is naive, speaking from a perspective of someone who has spent the last 10 years supporting a SaaS mostly for engineers. |
|
| ▲ | kobieps 4 days ago | parent | prev | next [-] |
| Agreed @ not for the faint of heart. There is at least one alternative "CRDT-free" approach for the less brave among us: https://mattweidner.com/2025/05/21/text-without-crdts.html |
| |
| ▲ | quotemstr 4 days ago | parent | next [-] | | > Difference from CRDTs The author has made a CRDT. He denies that his algorithm constitutes a CRDT. It's a straightforward merge, not a "fancy algorithm". What specific aspect of a CRDT does this solution not satisfy? The C? The R? The D? The T? | | |
| ▲ | justinpombrio 4 days ago | parent [-] | | I was going to say that that's not a CRDT because it requires a centralized server (the conflict resolution is "order in which the server received the messages", and clients aren't allowed to share updates with each other, they can only get updates from the server). But now I'm looking at definitions of CRDTs and it's not clear to me whether this is supposed to count or not. Still, every algorithm that's actually labeled a CRDT shares a magical property: if my replica has some changes, and your replica has some changes, our replicas can share their changes with each other and each converge closer to the final state of the document, even if other people have been editing at the same time, and different subsets of their changes have been shared with you or I. That is, you can apply peoples' changes in any order and still get the same result. I don't think it's useful to call anything without that property a CRDT. | | |
| ▲ | immibis 3 days ago | parent [-] | | The C in CRDT means the order doesn't matter, which means you can just put all the gossiped changes into a big bag of changes and if everyone knows the same changes, they have the same final document, so a simple gossip protocol that just shares unshared data blobs will eventually synchronize the document. If order matters, it's not a CRDT. This one isn't a CRDT because the order matters if two clients insert text at the same position. |
|
| |
| ▲ | josephg 3 days ago | parent | prev [-] | | Matt Weindner is a really smart guy, but I really disagree with his reasoning with that one. I implemented his fuguemax crdt in just 250 lines of code or so. It’s small, simple and fast. In that blog post he proposes a different approach that might let you save 50 lines of code at the expense of always needing a centralised server. Seems like a terrible trade to me. Just use a crdt. They’re good. https://github.com/josephg/crdt-from-scratch |
|
|
| ▲ | perlgeek 3 days ago | parent | prev | next [-] |
| Also, when you get a new requirement that needs a modification of the data model, you have to both remodel your CRDTs and make sure you have a migration strategy. After doing this a few times, your stakeholders are probably fed up with the crawling pace of development, and the whole local-first app is scrapped again and replaced by a more traditional app. Maybe the architect is fired. |
|
| ▲ | yen223 4 days ago | parent | prev | next [-] |
| There's the additional headaches of a) managing auth in a distributed manner, and b) figuring out how to evolve the data model across all participating clients. CRDTs are a complicated way to solve a problem that most services don't really have, which is when you want to sync data but don't want to have any one client be the ultimate source of truth. |
|
| ▲ | BatteryMountain 3 days ago | parent | prev | next [-] |
| Yeah I must say, this one (crdt) has been an impossible one to solve for me in practical terms. Every single time we end up with something like this: dataflow has 3 modes/directions: downstream only, upstream only and bi-directional. The majority of systems ends up needing bi-directionality (unless dealing with sensors/iot data) at some point, which means you are forced to deal with conflicts. Which means you end up having to compromise and the simplest compromise for 95% of applications is to say "last one wins", which works near perfect in the real world and it is simpler to maintain and debug. The remaining 5% has a hard constraint where you can either go down an academic rabbit hole with crdt's and come out the other end with some grey hairs, or, you still use your normal data flows but have multistep commits (not in a database sense, but a workflow/saga sense), so you have some supervising object that makes sure both sides agree after x amount of time or revert (think about banks, semi-realtime & distributed, but forced to be consistent). And for the younger devs: please consider if you need these kinds of sync systems at all (distributed & offline modes), sometimes a simple store-forward of events and/or cache is all you need. If you have some leverage, try to advocate that your application has a hard requirement on an internet connection as often it is cheaper to install fibre redundancies than dealing with the side effects of corrupted or lost data. Might save you from early grey hairs. ps: the above is written with business/LoB applications in mind, with some offline mobile/desktop apps in the mix, not things like control systems for factories or real time medical equipment. |
|
| ▲ | motorest 4 days ago | parent | prev | next [-] |
| > Also remember it turns your data model into (...) I don't this is true at all. A CRDT is not your data model. It is the data structure you use to track and update state. Your data model is a realization of the CRDT at a specific point in time. This means a CRDT instance is owned by a repository/service dedicated to syncing your state, and whenever you want to access anything you query that repository/service to output your data model. Sometimes problems are hard. Sometimes you create your own problems. |
| |
| ▲ | crazygringo 2 days ago | parent [-] | | A CRDT requires its own data model that becomes, in a sense, the "main" data model. Because its design and constraints effectively become the design and constraints and requirements on the downstream data snapshot. CRDT's generally require you to track a lot more stuff in your snapshot model than you would otherwise, in order to create messages that contain enough information to be applied and resolved. E.g. what was previously an ordered list of items without ID's may now need to become a chain of items each with their own ID, so you can record an insert between two other items rather then just update the list object directly. So yes, the CRDT is effectively your data model. |
|
|
| ▲ | __MatrixMan__ 4 days ago | parent | prev [-] |
| It sort of depends on the owl though, right? If your CRDT is nothing more than a set of tuples updated on the basis of: these are what my peers have... Is there an abyss of complexity that I'm overlooking here or are simple CRDTs in fact quite simple. |