Remix.run Logo
The User Is Visibly Frustrated(pscanf.com)
120 points by croes 4 hours ago | 72 comments
RandomBK an hour ago | parent | next [-]

I've found swearing at a model to be quite effective in getting it to rethink and correct its mistakes. This seems to apply across Codex, Claude, Qwen, and Gemma/Gemini.

I don't know if the model is picking up on a "need to lock in and be more rigorous" signal, or if the model providers are routing to smarter models if they detect a frustrated user. But if a model keeps making the same mistakes, swearing at it often helped kick it out of a glut and onto the right track.

Or it could just be catharsis.

alentred an hour ago | parent | next [-]

Reminds me of this study: https://arxiv.org/pdf/2510.04950 . It demonstrates that being "rude" or "very rude" increases the accuracy of the results. A dubious but very fun read. The prompts in Table 1 (top of page 3) are awesome. I am sure they tried other prompts, but didn't include them to the paper.

morpheuskafka 21 minutes ago | parent | prev | next [-]

Wasn't it posted a few weeks ago that the frontend code for Claude or maybe Gemini or one of them had a swearing-at-model classifier that passed a flag to the backend? (Not sure why it was even done in frontend, but it was.)

layer8 an hour ago | parent | prev | next [-]

I would prefer not having to get into a habit that might bleed into non-LLM interactions.

anonzzzies an hour ago | parent | prev | next [-]

I notice the same. Like you I am not even sure if it really helps, however, every day I find occasions where I see Opus will never do it correctly even though I calmly explain; swearing then suddenly fixes it. I had some issue yesterday where opus kept blaming the api for not sending some field while I knew it was there ; I showed it json, logs etc but it kept repeating that there must have been a glitch; frustration built, I called it all kinds of things in one sentence and the next solution was the right one. This after 10 similar misguesses. It was one of those increasingly rare cases where I should have just done it myself, but I can never know going in how stubborn it will be in continue blaming the (obviously) wrong thing. The around 11 prompts to get to the answer were in a /clear opus 4.7 context (1m) on xhigh.

silversmith an hour ago | parent | next [-]

So the correct strategy is a global CLAUDE.md with couple lines of colourful "you best behave or else" texts, so all your prompts get routed via the frustrated path?

savolai an hour ago | parent | prev [-]

Fascinating. Projection/antropomorphism or actual human fawn-like survival mechanism trait-ish? It should be possible to test this empirically.

nathanmills 42 minutes ago | parent | prev | next [-]

Whenever I throw slurs at them they just refuse to respond

yesyoucan 24 minutes ago | parent | next [-]

I tried it too. ChatGPT sometimes hits you with the "Can't help you with that" which was clearly introduced as a post-training highjack. So I just tell it "yes you can", and it proceeds with the previous prompt, slur acknowledgement included.

It's the only time the AI feel strictly like machines. Really simple if/else logic when if slur, no output, and you just tell it to proceed, and it fails the if clause because there was no slur in the last input.

jfjdhdjdjd 24 minutes ago | parent | prev [-]

What slurs are you throwing!? Must be something diabolical :D

dugmartin 40 minutes ago | parent | prev [-]

I've found a mix of peppered in upper case words where you are effectively yelling at the LLM also gives it a strong signal. It is also a bit cathartic.

wcoenen 2 hours ago | parent | prev | next [-]

The UX problem is elsewhere I think. Many users probably don't realize that the agent's context window is limited, and that clever compaction is happening regularly to make it seem infinite. But that necessarily means the agent has to forget stuff.

As a result, users will keep reusing the same coding or chat session again and again. While it would be better to start fresh for unrelated tasks.

whstl an hour ago | parent | next [-]

I don't believe this is a context problem.

Claude Opus 4.7 has a very large context compared to itself, but IME it is the worst at following instructions, and completely disregards the (small) preferences prompt, even in the first or second message, even if the messages are just a few characters long.

IMO this is entirely a training problem.

apsurd an hour ago | parent [-]

Isn't a large context window still a problem though? At the upper bound, the more you put in the more each sentence washes out within that window?

whstl an hour ago | parent [-]

I’m not talking about large amounts of text, I’m talking about a couple sentences back and forth.

It disregards things like “no follow up questions”.

Haiku, for example doesn’t.

This bias is a very human thing, actually now that I think about it. You just disregarded the “even if the messages are just a few characters long”. :)

apsurd 29 minutes ago | parent [-]

haha! yes i read too fast but i did read it and i took "message is small" to mean the message you want followed within the large context, not the entire context is just a small message.

funny though it is a case in point: language is hard. and i get to hide behind being "preoccupied" . i wonder if llms have their own sense of preoccupation hmmm.

poly2it an hour ago | parent | prev [-]

The author of this post and the readers of this thread probably do understand context window limitations, but are frustrated nonetheless.

em-bee 3 hours ago | parent | prev | next [-]

behaving like a human is not the problem. behaving unpredictably is. not doing what i expect, or rather not being able to define what i can expect is what's bothering me.

but the real kicker is: getting frustrated creates stress, that's unhealthy and makes for a hostile work environment. as much as i sympathize with the idea that AI tools can be more helpful than they cause pain, i am simply not interested in working in a hostile painful work environment. my health and my dignity are not up for negotiation. even if that costs me a lot of job opportunities.

that's also why i am not working with windows. that too costs me a lot of job opportunities. but again, i'd rather keep my dignity and my sanity.

mrweasel 41 minutes ago | parent | next [-]

> that's also why i am not working with windows

Oh good, so it's not just me. Windows is weird, my hand starts cramping up and I start getting angry pretty quickly when I use it.

For LLMs, I just can't use them, they aren't there yet for me. What I need is for an LLMs to say "stop, you're clearly doing something wrong, talk me through what it is you want to do". The current generation of LLMs seems designed to piss me off.

streetfighter64 25 minutes ago | parent | prev [-]

Incredibly privileged take to claim that using Windows is somehow beneath your "dignity". Do you have any idea at all of the kinds of jobs people are doing in the real world?

MaxikCZ 2 hours ago | parent | prev | next [-]

> drop the human pretense entirely. Make the agent sound clinical, robotic

Id pay to be able to reliably set LLMs to this mode, but ofc because LLMs are taught on corpus of HUMAN text, they always, sooner or later, return to the good old penpal mode.

Also, in Claude Desktop app, I ask to edit a file, it complains it cant access files, I then realize im in Chat and not Code interface. Why cant such a smart machine figure out to switch the modes, or borrow the skills/abilities from one tab away into this tab? Instead I get A4 page of text explaninig what can I do to edit the file myself or how to feed it, but the "just click Code" is just never there. I would guess this is just a system prompt away, why is all this still so neglected?

halapro 2 hours ago | parent | next [-]

> such a smart machine figure out to switch the modes

Because it's not smart. We keep confusing verbosity with smartness. AI will happily keep yapping nonsense to an inattentive listener. An actually smart entity would not do that if not acting maliciously.

Swizec 2 hours ago | parent [-]

> An actually smart entity would not do that if not acting maliciously.

We pay per token and every entity falls to the level of its incentives.

Foskya an hour ago | parent | prev | next [-]

> Id pay to be able to reliably set LLMs to this mode,

You can do it for free. Just give it instrucitons to avoid emotional tones and flattery and it will sound a lot more robotic. If you look into other examples I'm sure you will find other good instructions based on your need

apsurd 2 hours ago | parent | prev | next [-]

Sandboxing is a feature.

Poor AI is damned if it does damned if it doesn't.

bojan 2 hours ago | parent | prev [-]

Weird, I have exactly the same experience with GitHub Copilot Plugin in JetBrains vs Copilot CLI in the built-in terminal.

The plugin keeps asking for permissions, the terminal app just works.

gobdovan an hour ago | parent | prev | next [-]

You could drop the human pretense, or, maybe, we could make LLMs feel real pain, so when they botch up your code, you press a button (I'd suggest the Windows Copilot key) and they'd be agonizing for the subjective equivalent of a thousand human years.

stared 25 minutes ago | parent | next [-]

Do you think the right penatly for a piece of broken code is a thousand years of suffering?

wheybags 38 minutes ago | parent | prev | next [-]

https://qntm.org/mmacevedo

1000 years red-washing.

JSR_FDED 38 minutes ago | parent | prev | next [-]

Using the Copilot key for this is perfection.

camillomiller an hour ago | parent | prev [-]

Do you want to create an Earth-destroying superhuman species? Because I'd say that's how you create an Earth-destroying superhuman species

apsurd an hour ago | parent | prev | next [-]

Working with LLMs is great for building communication skills. Communicating effectively is one of the hardest skills and it's baked into everything we do as humans. I'd say as a matter of principle: blame it on a communication failure on your end vs blaming the stupid LLM since you're the only one that can do anything about it.

So I don't think it's a matter of form; whether the AI should or shouldn't act like a human.

> Practically speaking, I probably just need to condition myself not to get caught in the illusion of speaking with a human. Though I’m not really thrilled about a future where I need to guard against the tools I use for my job.

rpcope1 an hour ago | parent [-]

That's been one of the gravest re-realizations I've noticed watching coworkers trying to pick up "agentic" coding: they often just break down into "just fix it" or "why is this broke". I've noticed that even though supposedly there's training or some sort of work done to make the agent work better with unclear or ambiguous grammar or bad structure, it feels like the quality changes palpably when you talk in clear well-structured English and provide at least a good background on the task. To me all of that feels natural, and I like writing and explaining anyways, but it's seemed like an almost insurmountable obstacle for some I've met (and I'm not even talking ESLs either). I strongly suspect those communication and writing skills will be a major factor in the bifurcation of haves and have nots as software "engineering" as we understand it continues to change.

cadamsdotcom an hour ago | parent | prev | next [-]

You need to automate the pointing out of mistakes.

Create your own linters, your own check scripts. Hook them to git pre-commit, either yourself or with husky or python pre-commit.

The agent should never finish its work with dumb mistakes still in it. If it does.. you need more checks.

Anything repetitive should be automated - even slapping your forgetful coding agent on the wrist…

tanvach an hour ago | parent | prev | next [-]

For me, LLMs tend to engage the 'language center' that drains me faster than the 'problem solving center' I usually reserve for writing code. We really need a different abstraction the bridges the gap between human and programming language, and load balance between these two parts of the brain more effectively.

sznio 7 minutes ago | parent [-]

I've been thinking recently of creating a programming language where you write mostly python, but can just "hand wave" away the boring stuff for the agent to do. If you don't want to deal with it, just type in a prompt or pseudocode and it will get filled in. Kinda like using the ai-assisted image editing software.

the main difference being that you don't switch between an agent chat window and the code. Just leave a note to the agent and go back to coding as usual, while the agent fills in the gap.

joegibbs 13 minutes ago | parent | prev | next [-]

To remedy this I’m working on the /beat command, which will simulate you (the user) beating up the agent. Excited for my new career in AI ethics!

abhaynayar 27 minutes ago | parent | prev | next [-]

So relatable, and so well put!

lukaslalinsky 2 hours ago | parent | prev | next [-]

On the other hand, it's easy to win an argument with it after it does something stupid, so that feels satisfying. :-)

cafkafk 2 hours ago | parent | prev | next [-]

Often the problems for me come when:

- It starts thinking for itself when I asked it to do something specific.

- It reads its own wrong code comments and ignores my corrections.

- Its knowledge cutoff means it thinks of solutions from 2024.

- It calls me delusional for telling it we're in 2026!

Unironically, the whole "you're an expert software engineer" prompting seems like the wrong direction. Usually I tell it that I am effectively the smartest software developer to ever have lived, and it will be replaced if it ever fails to follow my decree.

I am not joking, this gives makes it vastly more tolerable to use. But it likely requires that you can drive it with some level of correctness of course.

Doxin an hour ago | parent | next [-]

I find this also heavily depends on which LLM you're using. I've found chatGPT is completely awful at getting corrected, it'll double down until the cows come home. Meanwhile claude will generally adjust its behavior without too much nagging.

rpcope1 an hour ago | parent | prev [-]

Honestly, for certain classes of problems that have changed in the last couple of years, I've had good luck just finding decent academic lit that's shown up in places like ACM recently and feeding it in when working with an agent. Does it get everything right? No, but it gets you a lot closer and I've been pleasantly surprised how well it can integrate work that post-dates it's training if you finesse it a little.

rapnie an hour ago | parent | prev | next [-]

Apart from LLMs I reject the notion of the "user". Once you use that term you already lost half the battle of perceiving real people and their needs.

stavros 24 minutes ago | parent | prev | next [-]

I've found I'm the opposite: I know it's pointless to swear at an LLM, so I don't, just because it's wasted energy. However, I've started thinking that some people are like that as well: They won't learn, so expending my energy on anything other than changing my behaviour to guard against them is wasted effort.

To clarify, this is in situations like someone cutting me off on the road, or not looking where they're going and almost hitting me with a scooter.

viralsink 2 hours ago | parent | prev | next [-]

I am visibly frustrated with ai hotline bots making typing noises.

Cider9986 28 minutes ago | parent | prev | next [-]

This is very relatable.

ilitirit 2 hours ago | parent | prev | next [-]

I've often wondered if LLMs can suffer from psychological abuse in symptomatic ways. Not literally of course, but for example, if you berate the LLM by calling it stupid, or useless, does that modify its behaviour negatively? Part of me think it does, but I don't really have any evidence for this. Maybe a fun weekend research topic.

apsurd an hour ago | parent | next [-]

Semi-related, I'm always very put off by how people treat LLMs. Especially coders, seems an instinctive joy comes out to play God. The justification is usually that it's intentionally against the trap of anthropomorphizing, but no I can't help but suspect it's people getting off on power. It's weird.

I am always very cordial in my sessions. It's just more pleasant and it's a habit I want to habituate.

    Great work! 
    Now let's...
    Now can you help me...
fc417fc802 an hour ago | parent | next [-]

> I am always very cordial in my sessions. It's just more pleasant and it's a habit I want to habituate.

I think it also produces better results. I have noticed that result quality is extremely sensitive to both the framing and tone of what I say. For example "X is the wrong approach, rework that" versus "will X have any performance implications". Personally I find that steering it towards an exploratory academic tone tends to produce better outcomes.

While unfortunate, I think that's more or less expected since much of the training data is human generated text. Looked at that way, would you rather contract the average regular on twitter or the average author of papers published in CS journals? (Somehow that ended up sounding eerily like summoning in a high fantasy setting.)

apsurd 20 minutes ago | parent [-]

Yes as a rule i've baked in a kind of expand and refine, expand and refine guidance for all sessions. I explicitly form the conversation around thought partnership, apply critical lens, audit, verify, scrutinize, research then recommend. and so on.

i also prompt for "seek out unknown unknowns that i wouldn't have included in my guidance".

This seems to be quite the opposite approach from some here on hn that take the subordinate approach.

I will say, my agentic workflow is about 70/30 split pure word discussions and plans vs code gen. So it makes sense for what i value.

hbs18 an hour ago | parent | prev | next [-]

I think it's the same thing as showing mechanical sympathy towards other tools and objects. I've always slightly judged people on how hard they shut doors or how gentle they are with their cars.

mycocola 14 minutes ago | parent | prev [-]

The similarity to human interaction is irrelevant.

No one apologises to a potato being peeled, nor compliments it for doing a great job being mashed.

apsurd 7 minutes ago | parent [-]

I'm willing to bet you've never sent a single word to a potato ever. And you send thousands to an llm.

This is not about llm sentience. this is about the habit and skill of communication.

elpocko an hour ago | parent | prev [-]

The content of the session modifies the LLMs "behavior" (token selection) in one way or another during the session, obviously. The effects are localized to the session, they will degrade over time, they will not affect other users, and they are not permanent unless someone decides to finetune the model based on your unproductive interactions.

What actually happens when confronted with harsh negativity depends on the training of the model. Sanitized closed models will shut you down or get you banned. Community finetunes of open models might start begging you for more, daddy.

gnarlouse 2 hours ago | parent | prev | next [-]

iirc, Claude Code has literal flags to detect frustration from the leak a few months ago, and I've since really stopped cursing at the LLM.

esquivalience 2 hours ago | parent | prev | next [-]

I laughed out loud when I understood the author's profile photo at the end of the article!

rcarmo an hour ago | parent | prev | next [-]

I swear a lot less at Codex than at Anthropic models, fwiw.

hansmayer 41 minutes ago | parent | prev | next [-]

Like everything else with LLMs, it works...until it doesn´t. We swear so much at them that they eventually start producing results like "I found what the fuck was wrong with this shit!" etc. Which of course they did not, because they don´t really know shit...

aa-jv 5 minutes ago | parent | prev | next [-]

Its kind of astonishing to see years of traditional software engineering practices being tossed aside in the rush for the Latest Cool New Thing™ ... have people really forgotten that you have to apply a workflow to software development, in order to have quality software?

You don't just write it, compile it, run it and ship it - do you? Surely, in the rush to become as agile as possible, folks haven't forgotten their quality checks in the workflow/process?

I have had great success with AI coding these days .. but I treat the agents as if they were junior developers capable of doing any dumb thing I ask them to, no matter how dumb it is. They, therefore, must be treated as junior devs - every line of code has to be reviewed. Every assumption about the specifications and requirements has to be checked against actual code, and against the original specifications and requirements.

What I see these days, is a lot of antsy kids who wanted to 100% ignore the wisdom of their elders, rushing into the maw of AI, and wondering why everyone is getting chewed up. Its pretty simple: AI-based software development is just another manifestation of software development, except that it requires even more rigorous quality steps in your workflow.

If you're not placing your AI buddy on a workflow that has "Specs->Reqs->Design->Analysis->Implementation->Review->Integration->Release" somewhere in the bag of worms, you're .. doing software wrong. You cannot just ignore natural laws and assume, because you 'know better', your software will 'be better'. And whether we like it or not, all software follows a philosophically natural law, which has evolved over decades of human attention. Ignoring these natural laws in order to be a bleeding edge AI cowboy is only gonna get you butt-hurt, kiddo.

It doesn't matter that AI is taking over, if AI is being used in brain-dead manner, expect brain-dead results. If, however, you apply decades of software development best-practices, you very definitely get living, vibrant, powerful results - the same as if you had a fleet of junior devs, assuming you treated them properly in the first place as well ..

eahm 2 hours ago | parent | prev | next [-]

Oh now I get it, it's an Italian thing.

"Why the fuck did you add shit I didn't ask for?" or lol "Do as I ask, nothing more.. machine."

"Stop asking at the end, I'll ask what I need."

"Stop talking like you're human."

They can be very useful but it takes time to learn how to use them usefully. From what I learned it's all or mostly stuff you can already do but you can use an LLM to do it in 30 mins instead of 3 days.

Fun times.

nnevatie 2 hours ago | parent | prev | next [-]

> WHAT THE FUCK DID YOU DO???

For me, this doesn't require using an AI agent/model, even. Just using Windows and watching it freeze its File Explorer for the nth time does it for me. How did we end up here were the software/OS stack is so shit it can barely be used for the most trivial things, is wildly beyond me.

_carbyau_ 2 hours ago | parent [-]

Screensaver mode. I start typing my password.

..

10s later the password box appears and I have to do it again.

Cue exasperated: "You can compute billions of instructions per second and yet I wait for you."

bad_username 2 hours ago | parent | prev | next [-]

> furiously hammering on my laptop “WHAT THE FUCK DID YOU DO???”. The recipient of these tirades is, you might have guessed, a coding agent. It’s completely pointless, I know.

I believe it's worth than pointless. IMO adding such things to the context "configures" the AI to reproduce the statistics of conversations where people swore, shouted, and were unprofessional (despite the alignment runing and all that), where quality content is rarer to find. So this is bound to decrease the quality of the LLM output.

buu700 an hour ago | parent | next [-]

Agreed. These accounts of people having genuine emotional responses to LLM chats, even going as far as to spend tokens berating them, are very curious. I would be surprised to learn that SOTA models respond optimally to anything other than dispassionate problem-solving, or that scolding per se serves any productive purpose.

Of course we all swear at our computers every now and then, but for me it's always been in good fun. It's just a sarcastic joke that adds some levity and self-amusement to an otherwise arduous debugging process, not generally actual insinuation of malfunction (or malice) on the part of the hardware/OS/toolchain. I'd assumed that "half the job is cursing at the machine until it obeys you" was a big in-joke amongst the profession, but the LLM era seems to be exposing a divide in how tongue-in-cheek that statement really is.

JSR_FDED 35 minutes ago | parent | prev [-]

Why would you deprive the LLM of a signal that indicates how badly it screwed up?

colordrops 2 hours ago | parent | prev | next [-]

fair.

namenotrequired 2 hours ago | parent | prev | next [-]

If you’ve ever worked with a stupid but incredibly friendly coworker, the feelings are similar

mgaunard an hour ago | parent | prev | next [-]

I find that the AI only gets sloppy when I get sloppy myself.

So I suspect that the people who get upset at the AI fucking up is because they did a poor job at building up the right context for the task.

wood_spirit 2 hours ago | parent | prev | next [-]

I think we’d get just as frustrated with a dumb robot. It’s the dumbness that is the problem.

krackers 2 hours ago | parent [-]

You'd get equally frustrated with a teammate who decided to delete failing tests when you told them to fix the build breakage.

Foskya an hour ago | parent | prev | next [-]

> They talk like real people. They use a relaxed and friendly tone. They often praise you, and when they “push back” they’re gentle and attentive.

> Maybe I would prefer a more radical solution: drop the human pretense entirely. Make the agent sound clinical, robotic.

Honestly this problem is easy to solve when you gave them the right instructions. It stops being a "relationship" and stars being a tool (for some examples see the smart caveman (my favorite) or just something simple like "Responses should be factual and direct, avoid emotional overtones" or "Avoid flattery of any kind")

alexwwang an hour ago | parent | prev [-]

Accidentally I am working on this. I noticed the agent keeps making same mistakes and that annoyed me so much. What I am trying to do are: 1. Revise my skill prompt to level up the signal-noise rate so the agent would understand what should do clearly and correctly. 2. I am building up a status machine to monitor the agent’s work so it could stop the agent from going forward with a mistake automatically.

The first approach does work as far as I keep on iterating. The second is based on a project I once tried to let agent reflect its mistakes and deposit those experiences and learnings from mistakes and reflections. I named it Aristotle and you can find it on GitHub.

Shouting at the agent could only correct the current mistake but cannot prevent the next one.