| ▲ | zozbot234 10 hours ago | |
> and even then the inference is ungodly slow. This is the wrong way of putting it. Local inference with SOTA models is all about slowing down compute for the sake of fitting on bespoke repurposed hardware. You don't need to go fast if you have the whole machine to yourself 24/7. Cloud AI vendors can't match that kind of economics. | ||