Remix.run Logo
mFixman 2 days ago

The whole mess is a good example why benchmark-driven-development has negative consequences.

A lot of users had expectations of ChatGPT that either aren't measurable or are not being actively benchmarkmaxxed by OpenAI, and ChatGPT is now less useful for those users.

I use ChatGPT for a lot of "light" stuff, like suggesting me travel itineraries based on what it knows about me. I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.

Terretta 2 days ago | parent [-]

> I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.

Why? 8.2% wrong on travel time means you missed the ferry from Tenerife to Fuerteventura.

You'll be happy Altman said they're making it warmer.

I'd think the glaze mode should be the optional mode.

mFixman 2 days ago | parent | next [-]

Because benchmarks are meaningless and, despite having so many years of development, LLMs become crap at coding or producing anything productive as soon as you move a bit from the things being benchmarked.

I wouldn't mind if GPT-5 was 500% better than previous models, but it's a small iterative step from "bad" to "bad but more robotic".

tankenmate 2 days ago | parent | prev [-]

"glaze mode"; hahaha, just waiting for GPT-5o "glaze coding"!