Remix.run Logo
the_af 3 hours ago

> My favorite example is AI "getting tired" and "lazy" during long coding session

Never seen this even once, nor anyone I know ever reported this. Do you have an example?

nomel 3 hours ago | parent | next [-]

First I saw it was Claude Opus 3.7. Had a very long back and fourth about some code, I pointed out an error, and Claude responded "That's what I get for programming at 2am", with the output being filled with "... code here ..." type shortcuts, basically no ability to one-shot a whole implementation anymore. The conversation length WAS reasonably into the 2am range, if it were real. Thought about it, did the statistical trick where I tell it to "have some rest, take a day off!" then immediately follow up with "Ready to continue?", with the next response having no shortcuts, with full implementation, and much better quality. These are trained on human text. This is the human norm, so I always find it interesting when human like behaviors, very broadly present in the statistics, come out like this.

I also see it a little with Opus 4.7, with Claude Code, with the hint being much more terse planning text, that borderlines unhelpful. I put some "rest" in the context to push the latent space closer to what's in the statistics of the training data: a well rested human.

pj_mukh 3 hours ago | parent | next [-]

Are you sure you didn't run out credits and set effort to low? This exact thing happened to me when I did that. It just became, kinda lazy.

nomel 2 hours ago | parent [-]

3.7 "I'm tired" it was just direct API "chat", no CC that I could use at the time.

Current 4.7 Opus with claude code, with effort pinned to max, because I'm on an API only plan, with a personal daily limit you would probably be jealous of. ;)

the_af 3 hours ago | parent | prev [-]

How do you know you're not reading things that aren't there? LLMs are very good at roleplaying, and they will pick up on hints you may inadvertently be giving them (about them being "tired" and needing "rest", etc).

I have never witnessed this of Claude Opus, by the way. They do get context rot, but that's a relatively better understood phenomenon unrelated to personality.

rblatz 3 hours ago | parent | prev [-]

I see laziness all the time, Claude will be helping me plan work and then it will ask me how a piece of code is implemented. I then have the choice of manually verifying how it works, or to tell it to look for itself. Ideally it would just look without being told.

the_af 3 hours ago | parent [-]

That doesn't seem to be laziness, and is unrelated to how long the session has been going on.

It's crazy that we're concluding "personality" or human-like traits from this. There's definitely human behavior here, but it's unsurprisingly coming from us, the observers! This is something we've long known exists in the human brain, the tendency to pattern match and see intelligence/intent in the rest of the world. Any serious experiment must guard against this...

nomel 2 hours ago | parent [-]

Nobody is concluding that. These models are trained on human text. It's just statistics. It will respond like a human because it was trained on human text. They have to beat the hell out of the foundation models to get push the statistics how they are. I don't see this as anything but boring residuals of not beating hard enough.

the_af an hour ago | parent [-]

Yes, you are concluding this in the initial comment of this chain.

LLMs cannot get "tired" or "lazy", that's just you projecting animal behavior on something that's not an animal.

Now you're moving the goal posts, "it resembles a human". Well, you're primed to consider it one. ELIZA also "resembled" a human in that sense, but I don't think you would claim it could get bored or lazy. Nor that you could extrapolate to it from human behavior.

In any case, if you've seen online discourse, people rarely admit they are tired.

nomel 20 minutes ago | parent [-]

I'm not sure I understand.

These models are trained on human text, optimized to predict the next word for any given context seen in that text, then later optimized for specific contexts.

They are, quite literally, trained to write and BE as much like a human as possible, because only humans wrote the text. They are trained to be as human as possible, because all text was written by humans. It's simple, boring, statistics of raining data. Nothing more. Never claimed there was more.

This does not mean they contain systems that let them get tired. But, this does mean there are latent spaces that progress to generating text that contain text driven by human statistics. Like I said, I've had Claude say this to me. I've also had Claude refer to itself as "she". Does that mean it's a woman? No, it means there was a little bit extra "she" mentions in the training data (btw, this 100% repeatable behavior left with 3.7. They probably cleaned the data a bit better).

I'm not moving a goal post. You're just thinking I'm making a point that I'm not. As I've said several times, it's just boring statistics. Those statistics are initially optimized to mimic human text output, something that humans do. Again, they have to beat out human mimicry from the foundation models. See past reports of people who had access to them.

Here's a litmus test: what percentage of text (these models were trained on all of it) is written from a "I am not a human" type perspective? That's roughly the kind of bias you should see in a foundation model.