Remix.run Logo
zargon 3 days ago

We will buy 4 cards if they are 48 GB or more. At a measly 16 GB, we’re just going to stick with 3090s, P40s, MI50s, etc.

> 3x VRAM speed and 3x compute

LLM scaling doesn’t work this way. If you have 4 cards, you may get 2x performance increase if you use vLLM. But you’ll also need enough VRAM to run FP8. 3 cards would only run at 1x performance.