| ▲ | Catloafdev 4 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||
If you want frontier-level, the economically reasonable option is OpenRouter or a direct sub to frontier-of-your-choice. The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now. You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | daemonologist 4 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
There are also significant economies of scale (namely: utilization and batching), which tend to make inference on a shared server more economical even after the operator takes a cut. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | theossuary 3 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I truly think by 2028 we'll have integrated chip systems that'll be able to run opus 4.8 level models at ~500 watts at acceptable performance. Honestly I think now is the worst time to invest in AI hardware. Get your harness ready and processes perfected with hosted models, and wait a few years to buy hardware to transition to running models locally | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||