Remix.run Logo
teaearlgraycold 5 days ago

Personally I don't look at or respect LLM benchmarks at all. I've seen SOTA models fail in incredibly shocking ways even recently. Those moments immediately bring me out of the delusion that LLMs have thinking capacity or an understanding of code.

phatskat 5 days ago | parent [-]

> the delusion that LLMs have thinking capacity

It’s such a strange delusion too, because it’s easy to get caught up in for a moment and it’s easy to remember “oh no this thing is as smart as a bag of bricks”.

What strikes me more is how these companies sell their AI offerings - we watched an OpenAI presentation about spec-driven development recently and the presenter was fairly, idk, fine enough if maybe a bit grandiose. But what really nagged me was the way he ended his presentation with something along the lines of “we’re excited to see AGI continue to grow” and it’s honestly A) depressing and B) downright fraud - there is no current AGI to speak of, it’s all just guessing the string of words that sound best together and this OpenAI rep _knows this_.

They know that no amount of up-front spec writing will prevent bugs.

They know that their LLM doesn’t “know” anything in an actually meaningful way.

They know that calling what they have “AGI” is aspirational at best and lying at worst.