Remix.run Logo
Aurornis 12 hours ago

> gpt-oss-120b full quant runs on my quad 3090

A 120B model cannot fit on 4 x 24GB GPUs at full quantization.

Either you're confusing this with the 20B model, or you have 48GB modded 3090s.

segmondy an hour ago | parent [-]

Some of you folks on here love to argue, gpt-oss-120b was trained in 4 bits, so it pretty much takes up 60gb.

Aurornis 32 minutes ago | parent [-]

Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done.

segmondy 14 minutes ago | parent [-]

Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.