I wonder if the future in ~5 years is almost all local models? High-end computers and GPUs can already do it for decent models, but not sota models. 5 years is enough time to ramp up memory production, consumers to level-up their hardware, and models to optimize down to lower-end hardware while still being really good.

▲

johnsmith1840 4 hours ago | parent | next [-]

Opensource or local models will always heavily lag frontier.

Who pays for a free model? GPU training isn't free!

I remember early on people saying 100B+ models will run on your phone like nowish. They were completely wrong and I don't think it's going to ever really change.

People always will want the fastest, best, easiest setup method.

"Good enough" massively changes when your marketing team is managing k8s clusters with frontier systems in the near future.

▲

margalabargala 3 hours ago | parent | next [-]

I don't think this is as true as you think.

People do not care about the fastest and best past a point.

Let's use transportation as an analogy. If all you have is a horse, a car is a massive improvement. And when cars were just invented, a car with a 40mph top speed was a massive improvement over one with a 20mph top speed and everyone swapped.

While cars with 200mph top speeds exist, most people don't buy them. We all collectively decided that for most of us, most of the time, a top speed of 110-120 was plenty, and that envelope stopped being pushed for consumer vehicles.

If what currently takes Claude Opus 10 minutes to do can be done is 30ms, then making something that can do it in 20ms isn't going to be enough to get everyone to pay a bunch of extra money for.

Companies will buy the cheapest thing that meets their needs. SOTA models right now are much better than the previous generation but we have been seeing diminishing returns in the jump sizes with each of the last couple generations. If the gap between current and last gen shrinks enough, then people won't pay extra for current gen if they don't need it. Just like right now you might use Sonnet or Haiku if you don't think you need Opus.

	▲	johnsmith1840 9 minutes ago \| parent [-]
		This is the assumption of a hard plateu we can effectively optimize forever towards while possible we havn't seen it. Again my point is "good enough" changes as possibilities open. Marketing teams running entire infra stacks is an insane idea today but may not be in the future. You could easily code with a local model similar to gpt 4 or 3 now but I will 10-100x your performance with a frontier model and that will fundamentally not change. Hmmm but maybe there's an argument of a static task. Once a model hits that ability of that specific task you can optimize it into a smaller model. So I guess I buy the argument for people working on statically capped conplexity tasks? PII detection for example, a <500M model will outperform a 1-8B param model on that narrow task. But at the same time just a pii detection bot is not a product anymore. So yes a opensource one does it but as a result its fundamentally less valuable and I need to build higher and larger products for the value?

▲

kybernetikos 3 hours ago | parent | prev | next [-]

Gpt3.5 as used in the first commercially available chat gpt is believed to be hundreds of billions of parameters. There are now models I can run on my phone that feel like they have similar levels of capability.

Phones are never going to run the largest models locally because they just don't have the size, but we're seeing improvements in capability at small sizes over time that mean that you can run a model on your phone now that would have required hundreds of billions of parameters less than 6 years ago.

▲

onion2k 2 hours ago | parent [-]

The G in GPT stands for Generalized. You don't need that for specialist models, so the size can be much smaller. Even coding models are quite general as they don't focus on a language or a domain. I imagine a model specifically for something like React could be very effective with a couple of billion parameters, especially if it was a distill of a more general model.

	▲	christkv 27 minutes ago \| parent \| next [-]
		Thats what i want and orchestrator model that operates with a small context and then very specialized small models for react etc
	▲	MzxgckZtNqX5i 2 hours ago \| parent \| prev [-]
		I'll be that guy: the "G" in GPT stands for "Generative".

▲

torginus 2 hours ago | parent | prev | next [-]

I don't know about frontier, I code nowadays a lot using Opus 4.5, in a way that I instruct it to do something (like complex refactor etc) - I like that it's really good at actually doing what its told and only occasionally do I have to fight it when it goes off the rails. It also does not hallucinate all that much in my experience (Im writing Js, YMMV with other languages), and is good at spotting dumb mistakes.

That said, I'm not sure if this capability is only achievable in huge frontier models, I would be perfectly content using a model that can do this (acting as a force multiplier), and not much else.

▲

__MatrixMan__ 2 hours ago | parent | prev | next [-]

I think we'll eventually find a way to make the cycle smaller, so instead of writing a stackoverflow post in 2024 and using a model trained on it in 2025 I'll be contributing to the expertise of a distributed-model-ish-thing on Monday and benefitting from that contribution on Tuesday.

When that happens, the most powerful AI will be whichever has the most virtuous cycles going with as wide a set of active users as possible. Free will be hard to compete with because raising the price will exclude the users that make it work.

Until then though, I think you're right that open will lag.

▲

Vinnl 2 hours ago | parent | prev | next [-]

> People always will want the fastest, best, easiest setup method

When there are no other downsides, sure. But when the frontier companies start tightening the thumbscrews, price will influence what people consider good enough.

▲

bee_rider 2 hours ago | parent | prev [-]

The calculation will probably get better for locally hosted models once investor generosity runs out for the remotely hosted models.

▲

manbitesdog 6 hours ago | parent | prev | next [-]

Plus a long queue of yet-undiscovered architectural improvements

▲

vercaemert 5 hours ago | parent [-]

I'm suprised there isn't more "hope" in this area. Even things like the GPT Pro models; surely that sort of reasoning/synthesis will eventually make its way into local models. And that's something that's already been discovered.

Just the other day I was reading a paper about ANNs whose connections aren't strictly feedforward but, rather, circular connections proliferate. It increases expressiveness at the (huge) cost of eliminating the current gradient descent algorithms. As compute gets cheaper and cheaper, these things will become feasible (greater expressiveness, after all, equates to greater intelligence).

▲

bigfudge 3 hours ago | parent [-]

It seems like a lot of the benefits of SOTA models are from data though, not architecture? Won't the moat of the big 3/4 players in getting data only grow as they are integrated deeper into businesses workflows?

	▲	vercaemert 2 hours ago \| parent [-]
		That's a good point. I'm not familiar enough with the various moats to comment. I was just talking at a high level. If transformers are HDD technology, maybe there's SSD right around the corner that's a paradigm shift for the whole industry (but for the average user just looks like better/smarter models). It's a very new field, and it's not unrealistic that major discoveries shake things up in the next decade or less.

▲

infinitezest 6 hours ago | parent | prev | next [-]

A lot of manufacturers are bailing on consumer lines to focus on enterprise from what I've read. Not great.

▲

regularfry 6 hours ago | parent | prev [-]

Even without leveling up hardware, 5 years is a loooong time to squeeze the juice out of lower-end model capability. Although in this specific niche we do seem to be leaning on Qwen a lot.