| ▲ | We Are Changing Our Developer Productivity Experiment Design(metr.org) | |||||||||||||||||||
| 30 points by ej88 5 hours ago | 20 comments | ||||||||||||||||||||
| ▲ | keeda 7 minutes ago | parent | next [-] | |||||||||||||||||||
> When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI. In fact, one of the developers in the original study later revealed on Twitter that he had already done exactly that during the study, i.e. filtered out tasks he prefered not to do without AI: https://xcancel.com/ruben_bloom/status/1943536052037390531 While this was only one developer (that we know of), given the N was 16 and he seems to have been one of the more AI-experienced devs, this could have had a non-trivial effect on the results. The original study gets a lot of air-time from AI naysayers, let's see how much this follow-up gets ;-) | ||||||||||||||||||||
| ▲ | atleastoptimal an hour ago | parent | prev | next [-] | |||||||||||||||||||
It's kind of funny that METR is known primarily for both the most bearish study on AI progress (the original 20% slowdown one), and the most bullish one on AI progress (the long-task horizon study showing exponential increase in duration of tasks AI models can accomplish with respect to date of release). In either case, it seems people ended up bolstering their preexisting views on AI based on whichever study most affirmed them (for the former, that AI coding models didn't actually help and created a mirage of productivity that required more work to fix than was worth it, the latter that AI models were improving at an exponential rate and will invariably eclipse SWE's in all tasks in a deterministic amount of time.) I think the truth is somewhere in the middle. Just anecdotally we've seen multi-million dollar fortunes being minted by small teams developing using 90% AI-assisted coding. Anthropic claims they solely use agents to code and don't modify any code manually. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | ej88 4 hours ago | parent | prev | next [-] | |||||||||||||||||||
Really interesting updates to their 2025 experiment. Repeat devs from the original experiment went from 0-40% slowdown to now -10-40% speedup - and METR estimates this as a 'lower-bound' more devs saying they dont even want to do 50% of their work without AI, even for 50/hr 30-50% of devs decided not to submit certain tasks without AI, missing the tasks with the highest uplift it also seems like there is a skill gap - repeat devs from the first study are more productive with ai tools than newly recruited ones with variable experience overall it seems like the high preference for devs to use AI is actually hurting METR's ability to judge their speedup, due to a refusal to do tasks without it. imo this is indirectly quite supportive for ai coding's productivity claims. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | arctic-true 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
Those developer quotes are tough to read. Rate limits are going to hit like a truck when the labs eventually need to make a profit. | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | daxfohl an hour ago | parent | prev | next [-] | |||||||||||||||||||
"I don't want to do this without AI" sounds like we're already well into the brain atrophy stage of this. Now what? (I'd think about it myself but....) | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | sgillen 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
This is very interesting because I see a lot of AI detractors point to the original study as proof that AI is overhyped and nothing to worry about. In this new study the findings are essentially reversed (20% slowdown to 20% speedup). | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | camgunz 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
Unless this measures the entire SDLC longitudinally (like say, over a year) I'm not interested. I too can tell Claude Code to do things all day every day, but unless we have data on the defect rate it doesn't matter at all. | ||||||||||||||||||||
| ▲ | softwaredoug 4 hours ago | parent | prev | next [-] | |||||||||||||||||||
I'm a bit perplexed by the developer selection effects. I get that developers want to use AI. But are they also claiming there's not still a no/low-AI population of developers? Or that their means of selection don't find these developers? Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers? | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | Bnjoroge an hour ago | parent | prev [-] | |||||||||||||||||||
never been a better time to be a swe who doesnt or significantly limits the use of AI agents | ||||||||||||||||||||