Remix.run Logo
gertlabs 2 hours ago

Surprisingly, LLMs are actually much worse at reasoning in Python than other common programming languages for agentic coding tasks.

Data here: https://gertlabs.com/rankings?mode=agentic_coding

BariumBlue an hour ago | parent | next [-]

Hah, I was just thinking that Python likely has a vast ocean of training data, but it's likely of lower quality, being much of it is written by beginners and those who aren't primarily programmers.

topham 41 minutes ago | parent [-]

There's a broken idea that AI know Python because they're written in Python.

Not how any of it works.

gertlabs 19 minutes ago | parent [-]

While recent models are capable of generalizing to any language at this point, I do think there are weights from their pretraining corpus that still leak through into how they create their responses. We observed similar language preference patterns across models from different providers, btw.

isityettime 41 minutes ago | parent | prev | next [-]

I would love to see how they do with functional languages and especially Lisps here. I've noticed pretty good performance with Emacs Lisp relative to overall model strength, but I haven't used LLMs to application code in any such languages.

It would also be interesting to see how Python compares to other languages in its niche (Ruby, Perl, Raku).

Thanks for putting this together! It's interesting.

bushbaba an hour ago | parent | prev | next [-]

Cool to see my hunch be backed by data. Python is a scripting language with OOP bolted on. Means there’s not really a styling consistency that other languages have, with things tending to look like PHP, a collection of various scripts that invoke one another

rossjudson an hour ago | parent | prev | next [-]

My standard joke here:

Q: Say, what does this Python code do?

A: Nobody f&%^ing knows.

thfuran 43 minutes ago | parent [-]

That’s Perl.

altmanaltman 18 minutes ago | parent | prev | next [-]

Hey they said it had a lot of training data, not necessarily high-quality python code training data.

ricardo_lien 20 minutes ago | parent | prev [-]

This surprised me, but I can understand it - Python sucks in many ways lol.