Remix.run Logo
mFixman 2 days ago

Because benchmarks are meaningless and, despite having so many years of development, LLMs become crap at coding or producing anything productive as soon as you move a bit from the things being benchmarked.

I wouldn't mind if GPT-5 was 500% better than previous models, but it's a small iterative step from "bad" to "bad but more robotic".