Remix.run Logo
ben_w 12 hours ago

They've definitely improved in many areas. And not just the easily-gamed public metrics; I've got a few private tests of my own, asking them certain questions to see how they respond, and even on the questions where all versions make mistakes in their answers, they make fewer mistakes than they used to.

I can also see this live, as I'm on a free plan and currently using ChatGPT heavily, and I can watch the answers degrade as I burn through the free allowance of high-tier models and end up on the cheap models.

Now, don't get me wrong, I won't rank even the good models higher than a recent graduate, but that's in comparison to ChatGPT-3.5's responses feeling more like those of a first or second year university student.

And likewise with the economics of them, I think we're in a period where you have to multiply training costs to get incremental performance gains, so there's an investment bubble and it will burst. I don't think the current approach will get in-general-superhuman skills, because it will cost too much to get there. Specific superhuman skills AI in general already demonstrate, but the more general models are mostly only superhuman by being "fresh grad" at a very broad range of things, if any LLM is superhuman at even one skill then I've missed the news.