▲ | kllrnohj 4 days ago | |
> What kills performance are not memory copies, but locks. I'm pretty sure if every thread executing an LLM model had to have its own copy that that would murder performance more than any lock does, and it won't even be close. It's cheaper to copy than to lock when the data is small, but that does not scale and it also ignores things like reader/writer locks where the data is primarily read-only, at least during the concurrent stage. Or where the work can be safely chunked up such that writes don't ever overlap which is very common in graphics | ||
▲ | littlestymaar 4 days ago | parent [-] | |
> It's cheaper to copy than to lock when the data is small Exactly this. |