| ▲ | hedora 6 hours ago | |||||||||||||||||||||||||
I’m surprised (unless they replaced the core tcmalloc algorithm but kept the name). tcmalloc (thread caching malloc) assumes memory allocations have good thread locality. This is often a double win (less false sharing of cache lines, and most allocations hit thread-local data structures in the allocator). Multithreaded async systems destroy that locality, so it constantly has to run through the exception case: A allocated a buffer, went async, the request wakes up on thread B, which frees the buffer, and has to synchronize with A to give it back. Are you using async rust, or sync rust? | ||||||||||||||||||||||||||
| ▲ | skavi 6 hours ago | parent [-] | |||||||||||||||||||||||||
modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts. [0]: https://github.com/google/tcmalloc/blob/master/docs/design.m... | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||