Remix.run Logo
lexicality an hour ago

a lot of the training data is either for python 2 or just generally very low quality

stuaxo an hour ago | parent | next [-]

The quality issue doesn't seem unique to Python.

The versioning issue I've seen across libraries that version change in many languages.

I don't tend to hit Python 2 issues using LLMs with it, but I do hit library things (e.g. Pydantic likes to make changes between libraries - or loads of the libraries used a lot by AI companies).

prodigycorp an hour ago | parent | prev [-]

That could be it. I still see LLMs fail a set of static typing challenges that I created a couple years ago as a benchmark. Google models still fail it. I wonder if the lack of typing in a lot of the training data makes python harder to reason about?