▲ | Dylan16807 13 hours ago | |
No, no, nothing like that. Every layer of an LLM runs separately and sequentially, and there isn't much data transfer between layers. If you wanted to, you could put each layer on a separate GPU with no real penalty. A single request will only run on one GPU at a time, so it won't go faster than a single GPU with a big RAM upgrade, but it won't go slower either. | ||
▲ | oblio 5 hours ago | parent [-] | |
Interesting, thank you for the feedback, it's definitely worth looking into! |