| ▲ | Read Locks Are Not Your Friends(eventual-consistency.vercel.app) | ||||||||||||||||||||||||||||
| 15 points by emschwartz 3 days ago | 16 comments | |||||||||||||||||||||||||||||
| ▲ | amluto an hour ago | parent | next [-] | ||||||||||||||||||||||||||||
The code examples are confusing. The show the code that takes the locks, but they don’t show any of the data structures involved. The rwlock variant clones the Arc (makes sense), but the mutex variant does not (is it hidden inside inner.get)? In any case, optimizing this well would require a lot more knowledge of what’s going on under the hood. What are the keys? Can the entire map be split into several maps? Can a reader hold the rwlock across multiple lookups? Is a data structure using something like RCU an option? | |||||||||||||||||||||||||||||
| ▲ | ot an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This is drawing broad conclusions from a specific RW mutex implementation. Other implementations adopt techniques to make the readers scale linearly in the read-mostly case by using per-core state (the drawback is that write locks need to scan it). One example is folly::SharedMutex, which is very battle-tested: https://uvdn7.github.io/shared-mutex/ There are more sophisticated techniques such as RCU or hazard pointers that make synchronization overhead almost negligible for readers, but they generally require to design the algorithms around them and are not drop-in replacements for a simple mutex, so a good RW mutex implementation is a reasonable default. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | sevensor 25 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Does this apply also to std::shared_mutex in C++? This is a timely article if so; I’m in the middle of doing some C++ multithreading that relies on a shared_mutex. I have some measuring to do. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | _dky an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
If implementation is task based and task always runs on same virtual CPU (slots equaling CPUs or parallelism), wonder if something like below might help. RW lock could be implemented using an array of length equal to slots and proper padding to ensure each slot is in its own face line (avoid invalidating CPU cache when different slot is read/written). For read lock: Each task acquires the lock for their slot. For write lock: Acquire lock from left most slot to right. Writes can starve readers when they block on in-flight reader at a different slot when moving from left to right. I do not know how Rust RW locks are implemented. | |||||||||||||||||||||||||||||
| ▲ | whizzter an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I'd be super interested in how this compares between cpu architectures, is there an optimization in Apple silicon that makes this bad while it'd fly on Intel/AMD cpus? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | Retr0id an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
claudes love to talk about The Hardware Reality | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | api 39 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||
Take a look at crates like arc_swap if you have a read often write rarely lock case. You can easily implement the RCU pattern. Just be sure to read about how to use RCU properly. Well done this pattern gives you nearly free reads and cheap writes, sometimes cheaper than a lock. For frequent writes a good RWLock is often better since RCU can degrade rapidly and badly under write contention. | |||||||||||||||||||||||||||||