it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec

▲

timschmidt 14 hours ago | parent [-]

P40 is Tesla architecture which is no longer receiving driver or CUDA updates. And only available as used hardware. Fine for hobbyists, startups, and home labs, but there is likely a growing market of businesses too large to depend on used gear from ebay, but too small for a full rack solution from Nvidia. Seems like that's who they're targeting.

▲

segmondy 13 hours ago | parent [-]

99% of interest is in inference. If you want to fine-tune a model, just rent the best gpu in the cloud. It's often cheaper and faster.

▲

timschmidt 13 hours ago | parent [-]

Great option if you don't mind sharing your data with the cloud. Some businesses want to own the hardware their data resides on.

▲

cootsnuck 13 hours ago | parent | next [-]

How many businesses have the capabilities and expertise to train their own models?

	▲	timschmidt 13 hours ago \| parent [-]
		No idea. Probably more every day.

▲

segmondy 13 hours ago | parent | prev [-]

renting GPU, how is that sharing data with the cloud? you can rent GPU from GCP or AWS

▲

timschmidt 12 hours ago | parent [-]

I suppose if I rent a cloud GPU and just let it sit there dark and do nothing then I wouldn't have to move any data to it. Otherwise, I'm uploading some kind of work for it to do. And that usually involves some data to operate on. Even if it's just prompts.

	▲	segmondy an hour ago \| parent [-]
		So you also believe when you rent a server you are sharing your data with the cloud? AWS and GCP are copying all private data on servers? Give me a break. There's a big difference between renting a server and using an API.