Remix.run Logo
dingnuts 4 days ago

I picked one of the studies in the search (!) you linked. First of all, it's a bullshit debate tactic to try to overwhelm your opponents with vague studies -- a search is complete bullshit because it puts the onus on the other person to discredit the gargantuan amount of data you've flooded them with. Many of the studies in that search don't have anything to do with programming at all.

So right off the bat, I don't trust you. Anyway, I picked one study from the search to give you the benefit of the doubt. It compared leetcode in the browser to LLM generation. This tells us absolutely nothing about real world development.

What made the METR paper interesting was that they studied real projects, in the real world. We all know LLMs can solve well bounded problems in their data sets.

As for 3 I've seen a lot of broken nonsense. Let me know when someone vibe codes up a new mobile operating system or a competitor to KDE and Gnome lol

keeda 2 days ago | parent [-]

> a search is complete bullshit because it puts the onus on the other person to discredit the gargantuan amount of data you've flooded them with.

Alternatively, a search is a way to show that basing your opinions on limited personal experience or a single source is silly given the vast amount of other research out there that largely contradicts it. Worse if that single source itself happens to have flaws that are not sufficiently discussed, e.g. at least one of the 16 participants from the METR study deliberately filtered out large tasks that he strongly prefered do only with AI -- what does that mean for its results?

https://xcancel.com/ruben_bloom/status/1943536052037390531

> Many of the studies in that search don't have anything to do with programming at all.

That's fair, but unfortunately due to the limits of keyword search. For instance "medical coding" is not programming-related at all, but is being impacted by LLMs and gets caught in the keyword search ¯\_ (ツ)_/¯

Anyway if your preference is "real-world projects", here are a couple specific studies (including one that I had already separately mentioned in the linked comment) at much larger scales that show significant productivity boosts of LLM-assisted programming at doing their regular, day-job tasks:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566 (4867 developers across 3 large companies including Microsoft)

https://www.bis.org/publ/work1208.pdf (1219 programmers at a Chinese BigTech)

There are many more, but I left them out as they are based on other methodologies such as the use of standardized tasks for better comparability, or empirical analysis of open-source commits, or developer surveys, or student projects, which may get dismissed as "not an RCT on real-world tasks." Interestingly they all show comparable, positive results, so consider that it's not as straightforward to dismiss other studies as being irrelevant.