| ▲ | zozbot234 17 hours ago | |
The real case for private inference is not "organic", it's "slow food". Offering slow-but-cheap inference is an afterthought for the big model providers, e.g. OpenRouter doesn't support it, not even as a way of redirecting to existing "batched inference" offerings. This is a natural opening for local AI. | ||
| ▲ | selectodude 17 hours ago | parent [-] | |
But how slow is too slow (faster than you’d think) and even then, you’re in for $25,000 for even the most basic on-premise slow LLM. | ||