I think the issue is that LLMs are a cash problem as much as they are a technical problem. Consumer hardware architectures are still pretty unfriendly to running models which are actually competitive to useful models so if you want to even do inference on a model that's going to reliably give you decent results you're basically in enterprise territory. Unless you want to do it really slowly.

The issue that I see is that Nvidia etc. are incentivised to perpetuate that so the open source community gets the table scraps of distills, fine-tunes etc.

▲

butlike 9 hours ago | parent | next [-]

You got me thinking that what's going to happen is some GPU maker is going to offer a subsidized GPU (or RAM stick, or ...whatever) if the GPU can do calculations while your computer is idle, not unlike Folding@home. This way, the company can use the distributed fleet of customer computers to do large computations, while the customer gets a reasonably priced GPU again.

	▲	vlovich123 9 hours ago \| parent [-]
		The kinds of GPUs that are in use in enterprise are 30-40k and require a ~10KW system. The challenge with lower power cards is that 30 1k cards are not as powerful, especially since usually you have a few of the enterprise cards in a single unit that can be joined efficiently via high bandwidth link. But even if someone else is paying the utility bill, what happens when the person you gave the card to just doesn’t run the software? Good luck getting your GPU back.

▲

cyanydeez 4 hours ago | parent | prev [-]

Consumer hardware is there. grab a mac or AMD395+ and Qwen coder and Cline or Open code and you're getting 80% of the real efficiency.