Remix.run Logo
otabdeveloper4 5 days ago

No they haven't.

The hallucinate exactly as much as they did five years ago.

atleastoptimal 5 days ago | parent | next [-]

Absolutely untrue. Claiming GPT-3 hallucinates as much as o3 over the same token horizon on the same prompts is a silly notion and easily disproven by the dozens of benchmarks. You can code a complete web-app with models now, something far beyond the means of models so long ago.

otabdeveloper4 5 days ago | parent [-]

> caveats and weasel words

> "benchmarks"

Stop drinking the coolaid and making excuses for LLM limitations, and learn to use the tools properly given their limits instead.

antihero 5 days ago | parent | prev [-]

They really don’t though.

otabdeveloper4 5 days ago | parent [-]

Larger context lengths are awesome, but they don't fundamentally change the failure modes of LLMs.