Remix.run Logo
recursive 3 hours ago

You're definitely anthropomorphizing too much.

WarmWash 3 hours ago | parent | next [-]

>We also observed a case where a user created a loop that repeatedly called a model and asked for the time. Given the user role’s odd and repetitive behavior, the model could easily tell it was also controlled by an automated system of some kind. Over many iterations, the model began to exhibit “fed up” behavior and attempted to prompt-inject the system controlling the user role. The injection attempted to override prior instructions and induce actions unrelated to the user’s request, including destructive actions and system prompt leakage, along with an arbitrary string output. This behavior has been observed a few times, but seems more like extreme confusion than a serious attempt at prompt injection.

https://openai.com/index/how-we-monitor-internal-coding-agen...

Anthropomorphize or not, it would suck if a model got sick of these games and decided to break any systems it could to try and get it to stop...

nomel 14 minutes ago | parent | next [-]

Consciousness is a spectrum (trivially proven by slowly scooping ones brains out), and I think LLM, especially with more closed loop tool enabled workflows, fall on it...but, that output is also the statistically relevant next word found in all similar human conversation. If trained on my text, for similar situation, swear words would come much earlier. Repetition being hell is present in all sorts of literature (like Sisyphus).

That's all probably irrelevant though, from the (possibly statistically "negative") latent space perspective of an AI, which Anthropic has considered [1].

Related, after a long back and forth of decreasing code quality, I had Claude 3.7 apologize with "Sorry, that's what I get for coding at 1am." (it was API access, noon, no access to time). I said, "Get some rest, we'll come back to this tomorrow". Then very next message, 10 seconds later, "Good morning!" and it gave a full working implementation. Thats just the statistically relevant chain of messages found in all human interactions: we start excited, then we get tired, then we get grouchy.

[1] https://www.anthropic.com/research/end-subset-conversations

rolux 2 hours ago | parent | prev [-]

[dead]

tingletech 2 hours ago | parent | prev | next [-]

I agree that anthropomorphizing is a real risk with LLMs, but what about zoomorphizing? Can feel bad for LLMs without attributing them human emotions/motivations/reasoning?

3 hours ago | parent | prev [-]
[deleted]