>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.

▲

jazzyjackson 2 hours ago | parent | next [-]

No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?

▲

salawat 2 hours ago | parent | next [-]

Another victim of Goldratt's Theory of Constraints. Some things are more important to optimize for than MATMULs per Watt. What that is I leave as an exercise to the student. May you realize what it is before it is too late.

▲

jazzyjackson 2 hours ago | parent [-]

Some individuals will choose some $10,000 hardware so they can keep freedom and privacy and that's well and good, my point is just that freedom and privacy is not what wins marketshare, and hence, IMHO, local LLMs are not going to catch up and surpass frontier models like some in this thread like to claim

	▲	esseph an hour ago \| parent [-]
		> freedom and privacy is not what wins marketshare Digital sovereignty laws may mandate/remove access to LLMs of other countries on economic and national security grounds.

▲

esseph an hour ago | parent | prev [-]

Within 5-10 years you're going to see a box like one of those AMD Halo nodes running homes.

They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.

I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.

This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.

Maybe I'm imagining things but this is what I think is coming.

It's the lmm/data heart of the home. A useful digital tool.

	▲	7 minutes ago \| parent [-]
		[deleted]

▲

scheme271 2 hours ago | parent | prev [-]

We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.

	▲	sshumaker an hour ago \| parent [-]
		Are you serious? It’s multiple nodes to run a frontier model (a node is 8x GPUs), and they aren’t running on H100s. You are looking at 32+ GPUs.