| ▲ | chrsw 17 hours ago | |||||||
> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally? | ||||||||
| ▲ | embedding-shape 16 hours ago | parent | next [-] | |||||||
RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency. | ||||||||
| ||||||||
| ▲ | fgonzag 16 hours ago | parent | prev | next [-] | |||||||
The model is 64GB (int4 native), add 20GB or so for context. There are many platforms out there that can run it decently. AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc. | ||||||||
| ▲ | FuckButtons 14 hours ago | parent | prev [-] | |||||||
Mbp 128gb. | ||||||||