On the whole GPT-4 to GPT-5 is clearly the smallest increase in lucidity/intelligence. They had pre-training figured out much better than post-training at that point though (“as an AI model” was a problem of their own making).

I imagine the GPT-4 base model might hold up pretty well on output quality if you’d post-train it with today’s data & techniques (without the architectural changes of 4o/5). Context size & price/performance maybe another story though

▲

energy123 4 days ago | parent | next [-]

Basic prose is a saturated bench. You can't go above 100% so by definition progress will stall on such benchmarks.

	▲	mattw1810 4 days ago \| parent [-]
		All the same they choose to highlight basic prose (and internal knowledge, for that matter) in their marketing material. They’ve achieved a lot to make recent models more reliable as a building block & more capable of things like math, but for LLMs, saturating prose is to a degree equivalent to saturating usefulness.

▲

jstummbillig 4 days ago | parent | prev [-]

> On the whole GPT-4 to GPT-5 is clearly the smallest increase in lucidity/intelligence

I think it's far more likely that we increasingly not capable of understanding/appreciating all the ways in which it's better.

▲

achierius 4 days ago | parent [-]

Why? It sounds like you're using "I believe it's rapidly getting smarter" as evidence for "so it's getting smarter in ways we don't understand", but I'd expect the causality to go the other way around.

▲

jstummbillig 4 days ago | parent [-]

Simply because of what we know about our ability to judge capabilities and systems. It's much harder to judge solutions to hard problems. You can demonstrate that you can add 2+2, and anyone* can be the judge of that ability, but if you try to convince anyone of a mathematical proof you came up with, that would be a much harder thing to do, regardless of your capability to write that prove and how hard it was to write the proof.

The more complicated and/or complex things become, the less likely it is that a human can act as a reliable judge. At some point no human can.

So while it could definitely be the case that AI progress is slowing down (AI labs seem to not think so, but alas), what is absolutely certain is that our ability to appreciate any such progress is diminishing already, because we know that this is generally true.

▲

brabel 4 days ago | parent | next [-]

This thread shows that. People are saying gpt-1 was the best at writing poetry. I wonder how good they are at judging poetry themselves. I saw a blind study where people thought a story written by gpt5 was better than an actual human bestseller. I assume they were actual experts but I would need to check that.

▲

dgfitz 4 days ago | parent | prev [-]

> The more complicated and/or complex things become, the less likely it is that a human can act as a reliable judge. At some point no human can.

Give me an example, please. I can't come up with something that started simple and became too complex for humans to "judge". I am quite curious.

	▲	jstummbillig 4 days ago \| parent [-]
		I did not mean "become" in the sense of "evolve" but as in "later on an imagined continuum contained all things, that goes from simple/easy to complex/complicated" (but I can see how that was ambiguous)