| ▲ | Kirby64 3 hours ago | |
> Would "lots of gpus" even help for huge models? Maybe this is exposing my lack of knowledge but don't you need to keep the whole model and context in a single GPU's VRAM? How do you think the large providers do inference? No single GPU has 1TB plus of memory on board. It’s a cluster of a bunch of gpus. | ||