Remix.run Logo
Workaccount2 a day ago

I think people have a lot of rosy glasses and fondness for those early days, combined with general usability benchmarks being mostly saturated now. GPT-3.5 would say Dallas was the capital of USA, but GPT-4 got it every time!

GPT-4 launched with 8k context. It hallucinated regularly. It was slow. One-shotting code was unheard of, you had to iterate and iterate. It fell over even doing basic math problems.

GPT-5 thinking on the other hand is so capable that the average person wouldn't be able to really test it's abilities. It's really only experts operating in their domain who can find it's stumbling blocks.

I think because we have seen these constant incremental updates that it creates a staircase with small steps, but if you really reflect and look back, you'll see the actual capability gap from 3.5 to 4 compared to 4 to 5 is way way smaller. This is echoed in benchmarks too, GPT-5 is solving problems so wildly beyond what GPT-4 was capable of.

heyitsguay a day ago | parent [-]

What problems?