| ▲ | lexicality an hour ago | |
a lot of the training data is either for python 2 or just generally very low quality | ||
| ▲ | stuaxo an hour ago | parent | next [-] | |
The quality issue doesn't seem unique to Python. The versioning issue I've seen across libraries that version change in many languages. I don't tend to hit Python 2 issues using LLMs with it, but I do hit library things (e.g. Pydantic likes to make changes between libraries - or loads of the libraries used a lot by AI companies). | ||
| ▲ | prodigycorp an hour ago | parent | prev [-] | |
That could be it. I still see LLMs fail a set of static typing challenges that I created a couple years ago as a benchmark. Google models still fail it. I wonder if the lack of typing in a lot of the training data makes python harder to reason about? | ||