Remix.run Logo
simonw 4 hours ago

I know it's popular comparing coding agents to slot machines right now, but the comparison doesn't entirely hold for me.

It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

(I saw "no actual evidence pointing to these improvements" with a footnote and didn't even need to click that footnote to know it was the METR thing. I wish AI holdouts would find a few more studies.)

Steve Yegge of all people published something the other day that has similar conclusions to this piece - that the productivity boost for coding agents can lead to burnout, especially if companies use it to drive their employees to work in unsustainable ways: https://steve-yegge.medium.com/the-ai-vampire-eda6e4f07163

saulpw 3 hours ago | parent | next [-]

Yeah I'm finding that there's "clock time" (hours) and "calendar time" (days/weeks/months) and pushing people to work 'more' is based on the fallacy that our productivity is based on clock time (like it is in a factory pumping out widgets) rather than calendar time (like it is in art and other creative endeavors). I'm finding that even if the LLM can crank out my requested code in an hour, I'll still need a few days to process how it feels to use. The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human. This is just for my own personal projects, and it makes sense that the business incentives would be even more intense. But you can't get around the fact that, no matter how brilliant your software or interface, customers are not going to start paying in a few hours.

simonw 3 hours ago | parent [-]

> The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human

Yeah I really feel that!

I recently learned the term "cognitive debt" for this from https://margaretstorey.com/blog/2026/02/09/cognitive-debt/ and I think it's a great way to capture this effect.

I can churn out features faster, but that means I don't get time to fully absorb each feature and think through its consequences and relationships to other existing or future features.

mrbungie 4 hours ago | parent | prev | next [-]

If you are really good and fast validating/fixing code output or you are actually not validating it more than just making sure it runs (no judging), I can see it paying out 95% of the time.

But for what I've seen both validating my and others coding agents outputs I'd estimate a much lower percentage (Data Engineering/Science work). And, oh boy, some colleages are hooked to generating no matter the quality. Workslop is a very real phenomenon.

biophysboy 3 hours ago | parent [-]

This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.

I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.

baq 3 hours ago | parent [-]

try codex 5.3. it's dry and very obviously AI; if you allow a bit of anthropomorphisation, it's kind of high-functioning autistic. it isn't an oracle, it'll still be wrong, but it's a powerful, completely different from claude tool.

biophysboy 3 hours ago | parent [-]

Does it get numbers right? One of the mistakes it made in reading the paper was swapping sets of numbers from the primary/secondary outcomes.

baq 2 hours ago | parent [-]

it does get screenshots right for me, but obviously I haven't tried on your specific paper. I can only recommend trying it out, it's also has a much more generous limits in the $20 tier than opus.

biophysboy 2 hours ago | parent [-]

I see. To clarify, it parsed numbers in the pdf correct, but assigned them the wrong meaning. I was wondering if codex is better at interpreting non text data

r00tanon 2 hours ago | parent | prev | next [-]

I was going to mention Yegge's recent blog posts mirroring this phenomena.

There's also this article on hbr.org https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies...

This is a real thing, and it looks like classic addiction.

Retr0id 3 hours ago | parent | prev | next [-]

It's 95% if you're using it for the stuff it's good at. People inevitably try to push it further than that (which is only natural!), and if you're operating at/beyond the capability frontier then the success rate eventually drops.

fdefitte 3 hours ago | parent | prev | next [-]

That 95% payout only works if you already know what good looks like. The sketchy part is when you can't tell the diff between correct and almost-correct. That's where stuff goes sideways.

Kiro 3 hours ago | parent | prev | next [-]

Just need to point out that the payout is often above 95% at online casinos. As long as it's below 100 the house still wins.

mikkupikku 3 hours ago | parent [-]

He means a slot machine that pays you 95% of the time, not a slot machine that pays out 95% of what you put in.

Claude Code wasting my time with nonsense output one in twenty times seems roughly correct. The rest of the time it's hitting jackpots.

fy20 3 hours ago | parent | prev | next [-]

> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it

Right but the <100% chance is actually why slot machines are addictive. If it pays out continuously the behaviour does not persist as long. It's called the partial reinforcement extinction effect.

jrflowers 3 hours ago | parent | prev [-]

> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.

“It’s not like a slot machine, it’s like… a slot machine… that I feel good using”

That aside if a slot machine is doing your job correctly 95% of the time it seems like either you aren’t noticing when it’s doing your job poorly or you’ve shifted the way that you work to only allow yourself to do work that the slot machine is good at.