Remix.run Logo
missedthecue 3 hours ago

There is a lot of talking past each other when discussing LLM performance. The average person whose typical use case is asking ChatGPT how long they need to boil an egg for hasn't seen improvements for 18 months. Meanwhile if you're super into something like local models for example the tangible improvements are without exaggeration happening almost monthly.

Foobar8568 40 minutes ago | parent | next [-]

Random trivia are answered much better in my case.

petesergeant an hour ago | parent | prev [-]

> The average person whose typical use case is asking ChatGPT how long they need to boil an egg for hasn't seen improvements for 18 months

I don’t think that’s true. I think both my mother and my mother-in-law would start to complain pretty quickly if they got pushed back to 4o. Change may have felt gradual, but I think that’s more a function of growing confidence in what they can expect the machine to do.

I also think “ask how long to boil an egg” is missing a lot here. Both use ChatGPT in place of Google for all sorts of shit these days, including plenty of stuff they shouldn’t (like: “will the city be doing garbage collection tomorrow?”). Both are pretty sharp women but neither is remotely technical.