Some (partial) counter points:

- I think given public available metrics, it's clear that this isn't translating into more products/apps getting shipped. That could be because devs are now running into other bottlenecks, but it could also indicate that there's something wrong with these studies.

- Most devs who say AI speeds them up assert numbers much higher than what those studies have shown. Much of the hype around these tools is built on those higher estimates.

- I won't claim to have read every study, but of the ones I have checked in the past, the more the methodology impressed me the less effect it showed.

- Prior to LLMs, it was near universally accepted wisdom that you couldn't really measure developer productivity directly.

- Review is imperfect, and LLMs produce worse code on average than human developers. That should result in somewhat lowered code quality with LLM usage (although that might be an acceptable trade off for some). The fact that some of these studies didn't find that is another thing that suggests there shortcomings in said studies.

▲

keeda 13 hours ago | parent [-]

> - Most devs who say AI speeds them up assert numbers much higher than what those studies have shown.

I am not sure how much is just programmers saying "10x" because that is the meme, but if at all realistic numbers are mentioned, I see people claiming 20 - 50%, which lines up with the studies above. E.g. https://news.ycombinator.com/item?id=45800710 and https://news.ycombinator.com/item?id=46197037

> - Prior to LLMs, it was near universally accepted wisdom that you couldn't really measure developer productivity directly.

Absolutely, and all the largest studies I've looked at mention this clearly and explain how they try to address it.

> Review is imperfect, and LLMs produce worse code on average than human developers.

Wait, I'm not sure that can be asserted at all. Anecdotally not my experience, and the largest study in the link above explicitly discuss it and find that proxies for quality (like approval rates) indicate more improvement than a decline. The Stanford video accounts for code churn (possibly due to fixing AI-created mistakes) and still finds a clear productivity boost.

My current hypothesis, based on the DORA and DX 2025 reports, is that quality is largely a function of your quality control processes (tests, CI/CD etc.)

That said, I would be very interested in studies you found interesting. I'm always looking for more empirical evidence!

	▲	dns_snek 13 minutes ago \| parent [-]
		> I see people claiming 20 - 50%, which lines up with the studies above Most of those studies either measure productivity using useless metrics like lines of code, number of PRs, or whose participants are working for organizations that are heavily invested in future success of AI. One of my older comments addressing a similar list of studies: https://news.ycombinator.com/item?id=45324157