On-premise LLMs are also getting better and likely won’t stop; as costs go up with the technical improvements, I would imagine cost saving methods to also improve

▲

horsawlarway 7 hours ago | parent [-]

I still think it's basically unavoidable that most people who might pay for api access will end up on-prem.

Fixed costs, exact model pinning, outage resistant, enshittification resistant, better security, better privacy, etc...

There are just so many compelling reasons to be on-prem instead of dependent on a 3rd party hoovering up all your data and prompts and selling you overpriced tokens (which eventually they MUST be, because these companies have to make a profit at some point).

If the only counterbalance is "well the api is cheaper than buying my own hardware"...

That's a short term problem. Hardware costs are going to drop over time, and capabilities are going to continue improving. It's already pretty insane how good of a model I can run on two old RTX-3090s locally.

Is it as good as modern claude? No. Is it as good as claude was 18 months ago? Yes.

Give it a decade to see companies really push into the "diminishing returns" of scaling and new models... combined with new hardware built with these workloads in mind... and I think on-prem is the pretty clear winner.

	▲	bigbinary 6 hours ago \| parent [-]
		These big players don’t have as big of a moat as they like to advertise, but as long as VC wants to subsidize my agents, I’ll keep paying for the $20 plan until they inevitably cut it off