Remix.run Logo
nkmnz an hour ago

> A study from METR found that when developers used AI tools, they estimated that they were working 20% faster, yet in reality they worked 19% slower. That is nearly a 40% difference between perceived and actual times!

It’s not. It’s either 33% slower than perceived or perception overestimates speed by 50%. I don’t know how to trust the author if stuff like this is wrong.

jph00 43 minutes ago | parent | next [-]

> I don’t know how to trust the author if stuff like this is wrong.

She's not wrong.

A good way to do this calculation is with the log-ratio, a centered measure of proportional difference. It's symmetric, and widely used in economics and statistics for exactly this reason. I.e:

ln⁡(1.2/0.81) = ln⁡(1.2)-ln⁡(0.81) ≈ 0.393

That's nearly 40%, as the post says.

piker an hour ago | parent | prev | next [-]

I get caught up personally in this math as well. Is a charitable interpretation of the throwaway line that they were off by that many “percentage points”?

nkmnz an hour ago | parent [-]

That would be correct, but also useless. It matters if 50pp are 50% vs. 100%, 75% vs. 125% or 100% vs. 150%.

regular_trash an hour ago | parent | prev | next [-]

Can you elaborate? This seems like a simple mistake if they are incorrect, I'm not sure where 33% or 50% come from here.

nkmnz an hour ago | parent | next [-]

Their math is 120%-80%=40% while the correct math is (80-120)/120=-33% or (120-80)/80=+50%

It’s more obvious if you take more extreme numbers, say: they estimated to take 99% less time with AI, but it took 99% more time - the difference is not 198%, but 19900%. Suddenly you’re off by two orders of magnitude.

jph00 40 minutes ago | parent | prev [-]

It's not a mistake. It's correct, and is a excellent way to present this information.

softwaredoug an hour ago | parent | prev [-]

Isn't the study a year old by now? Things have evolved very quickly in the last few months.

nkmnz an hour ago | parent [-]

Yes. No agents, no deep research, no tools, and just Sonnet-3.5 and 3.7 - I’d love to see the same study today with Opus-4.6 and Codex-5.3