Remix.run Logo
Jabrov 3 hours ago

Yes multiple GPUs absolutely help with inference even for a single model instance. Some models are simply too big to fit on the largest available GPU.

Check out tensor parallelism