> within a few years we will be running local models as good as today’s frontier models

I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

▲

majormajor 3 hours ago | parent | next [-]

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?

I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.

The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)

▲

iwontberude 2 hours ago | parent [-]

Exactly the right argument. Local LLM doesn’t need to outrun the bear (outperform data centers) it only needs to outrun its friend (total cost of ownership).

	▲	bombcar 2 hours ago \| parent [-]
		[dead]

▲

comfysocks 2 hours ago | parent | prev | next [-]

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.

▲

christopherwxyz 3 hours ago | parent | prev [-]

I would readjust your convictions.

We are only 2-4 years away from consumer grade immutable-weight ASICs.

▲

slashdave 3 hours ago | parent [-]

We are discussing how rapid development has been, and now you want to freeze your model in silicon?

▲

nixon_why69 2 hours ago | parent | next [-]

Why not have a bunch of SRAM and various operations like "Q4 matmul" in silicon? Model weights and even architectures could still evolve on a platform like that.

	▲	ac29 2 hours ago \| parent \| next [-]
		Doesnt "a bunch of SRAM" top out at maybe a few gigs per chip (with zero area used for logic)? You'd need an order of magnitude more to fit even a fairly weak general purpose LLM model.
	▲	throwa356262 2 hours ago \| parent \| prev \| next [-]
		I belive that is what NPUs are. The issue is the very huge amount of DRAM and high bandwidth these model require.
	▲	2 hours ago \| parent \| prev [-]
		[deleted]

▲

rogerrogerr 3 hours ago | parent | prev | next [-]

Genuine question from a place of ignorance: what in the silicon pipeline makes it take 2-4years to produce chips with a new model on them? Curious what the process bottleneck is.

	▲	jazzyjackson 2 hours ago \| parent \| next [-]
		Without being an insider, I imagine that most global fab capacity is contracted out several years in advance. You might be interested in the tiny tape out project, which guides you through the process of getting your own design etched on silicon. If you only need larger features and not the next gen single digit nanometer stuff, you may not be so supply constrained. https://tinytapeout.com/
	▲	pjc50 2 hours ago \| parent \| prev [-]
		I think you could get it down to three months between weight changes, if you can encode it in metal layers only. The remaining limits are the fab lead time, and the cost of a metal respin (hundreds of thousands to millions of dollars depending on process).

▲

dangus 2 hours ago | parent | prev [-]

If the silicon costs $200-300 and the company throws it away every two years that’s cheaper than a subscription.

Also, how many companies will just buy an M6/M7 MacBook Pro with 32GB+ of RAM in a couple of years and get “free” AI along with the workstation they were going to buy anyway?