| ▲ | reissbaker 9 hours ago | |||||||||||||||||||||||||||||||
RTX 6000 Pro retails for $10k so an 8x is $80k before anything else in the computer, and long-context will have... pretty bad performance (20+ seconds of waiting before any tokens come out), but it's true it technically works. I don't think cloud models are going away; the hardware for good perf is expensive and higher param count models will remain smarter for a looong time. Even if the hardware cost for kind-of-usable perf fell to only $10k, cloud ones will be way faster and you'd need a lot of tokens to break even. | ||||||||||||||||||||||||||||||||
| ▲ | zozbot234 9 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
> I don't think cloud models are going away; the hardware for good perf is expensive I think local AI will win in its niche by repurposing users' existing hardware, especially as cloud hardware itself gets increasingly bottlenecked in all sorts of ways and the price of cloud tokens rises. You don't have to care about "bad" performance when you've got dedicated hardware that runs your workloads 24/7. Time-critical work that also requires the latest and greatest model can stay on the cloud, but a vast amount of AI work just isn't that critical. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | alfiedotwtf an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
If 8 x RTX 6000 is getting you 20s before initial token, how are cloud vendors doing this? | ||||||||||||||||||||||||||||||||
| ▲ | otabdeveloper4 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
> higher param count models will remain smarter for a looong time They're not smarter, they just know more stuff. You probably don't need knowledge about Pokemon or the Diamond Sutra in your enterprise coding LLM. The "smarts" comes from post-training, especially around tool use. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||