> The reason this matters is that LLMs are incredibly nifty often useful tools that are not AGI and also seem to be hitting a scaling wall

I don't know who needs to hear this, but the real break through in AI that we have had is not LLMs, but generative AI. LLM is but one specific case. Furthermore, we have hit absolutely no walls. Go download a model from Jan 2024, another from Jan 2025 and one from this year and compare. The difference is exponential in how well they have gotten.

▲

missedthecue an hour ago | parent | next [-]

There is a lot of talking past each other when discussing LLM performance. The average person whose typical use case is asking ChatGPT how long they need to boil an egg for hasn't seen improvements for 18 months. Meanwhile if you're super into something like local models for example the tangible improvements are without exaggeration happening almost monthly.

▲

raincole an hour ago | parent | prev | next [-]

> exponential

Is this the second most abused english word (after 'literally')?

> a model from Jan 2024, another from Jan 2025 and one from this year

You literally can't tell the difference is 'exponential', quadratic, or whatever from three data points.

Plus it's not my experience at all. Since Deepseek I haven't found models that one can run on consumer hardware get much better.

▲

binary132 3 hours ago | parent | prev | next [-]

>go download a model

GP was talking about commercially hosted LLMs running in datacenters, not free Chinese models.

Local is definitely still improving. That’s another reason the megacenter model (NVDA’s big line up forever plan) is either a financial catastrophe about to happen, or the biggest bailout ever.

▲

wahnfrieden 3 hours ago | parent [-]

GPT 5.2 is an incredible leap over 5.1 / 5

▲

hadlock an hour ago | parent [-]

5.2 is great if you ask it engineering questions, or questions an engineer might ask. It is extremely mid, and actually worse than the o3/o4 era models if you start asking it trivia like if the I-80 tunnel on the bay bridge (yerba buena island) is the largest bore in the world. Don't even get me started on whatever model is wired up to the voice chat button.

But yes it will write you a flawless, physics accurate flight simulator in rust on the first try. I've proven that. I guess what I'm trying to say is Anthropic was eating their lunch at coding, and OpenAI rose to the challenge, but if you're not doing engineering tasks their current models are arguably worse than older ones.

	▲	SoftTalker an hour ago \| parent \| next [-]
		My impression is that software developers are the lions share of people actually paying for AI, but perhaps that's just my bubble world view.
	▲	magicalhippo an hour ago \| parent \| prev [-]
		But how many are willing to fork over $20 or so a month to ask simple trivia questions?

▲

smohare an hour ago | parent | prev [-]

[dead]