Remix.run Logo
simonw 12 hours ago

Here's another one that went un-cited:

> When you ask AI to generate code with dependencies, it hallucinates non-existent packages 19.7% of the time. One. In. Five.

> Researchers generated 2.23 million packages across various prompts. 440,445 were complete fabrications. Including 205,474 unique packages that simply don’t exist.

That looks like this report from June 2024: https://arxiv.org/abs/2406.10279

Here's the thing: the quoted numbers are totals across 16 early-2024 models, and most of those hallucinations came from models with names like CodeLlama 34B Python and WizardCoder 7B Python and CodeLlama 7B and DeepSeek 6B.

The models with the lowest hallucination rates in that study were GPT-4 and GPT-4-Turbo. The models we have today, 16 months later, are all a huge improvement on those models.