> Wrong, mostly.

> Model capability is a function of model size

Model effectiveness has improved across model sizes. You really should try the latest flash variants more. They have become my default for most tasks except for gnarly high-level planning.

▲

trollbridge 6 hours ago | parent | next [-]

Right - the idea that "bigger model = better" might have been true a year ago, but the flash models are extremely effective right now. You simply use them for the tasks they are ideally suited for.

▲

ACCount37 7 hours ago | parent | prev [-]

"Capability per parameter" is rising, but parameter count remains an advantage. And small models remain bad, because "good" is a rapidly moving target.

A 2026 4B beats 2024 4B, but both are far behind the contemporary frontier. Which makes them bad. There is no such thing as "too much capability" - a "good" model is whatever the current frontier is.

In 2024, a "good" model is one that can be trusted to write a 800 line script. In 2026, it's a model that can be trusted to do gnarly high-level planning and execution both. In 2028, it's going to be something like a model you can point at an extremely involved task, abandon, and have it report back with a "done" in 3 weeks.

▲

overfeed 5 hours ago | parent [-]

> A 2026 4B beats 2024 4B, but both are far behind the contemporary frontier.

The thing about engineering is you don't just use the biggest bolt on the market on every bridge.

> In 2024, a "good" model is one that can be trusted to write a 800 line script. In 2026, it's a model that can be trusted to do gnarly high-level planning and execution both

This sounds a lot like having a single diamond-head hammer as the only tool in your toolbox. As suggested by the name, flash models are fast - sometimes I want to write the equivalent of fifty 800-line scripts. There is such a thing as good enough.

▲

ACCount37 5 hours ago | parent [-]

Good enough? That's a lie people tell each other because they lack imagination.

"It's good enough" was said about GPT-4, o1, o3, Opus 4 and more. Guess what happened? Newer models released, people updated their expectations of what LLMs can do, usage got more aggressive, and somehow, GPT-4 went from "good enough" to "obsolete trash".

If you have no imagination, then at least substitute your pattern recognition for it.

The world is hungry for capabilities. There are piles upon piles of tasks that aren't done by LLMs simply because LLMs aren't good enough to do them.

The thing a frontier model gives you is "you don't have to babysit a model to get it to do X", and that X gets more and more impressive release to release.

▲

overfeed 4 hours ago | parent [-]

I wish you had addressed at least one of arguments in good faith before jumping to insults and countering a strawman argument I didn't make - I never claimed their will be no use for more capable models.

You do your AI-maximalism, and I'll stick to making trade-offs based on the needs of each piece of work.

	▲	ACCount37 4 hours ago \| parent [-]
		I.e. spending your time and effort on making choices that don't matter. I'll do more "per-task model selection" when AIs themselves get good at it.