Remix.run Logo
woeirua 16 hours ago

That's such a huge delta that Anthropic might be onto something...

conception 16 hours ago | parent | next [-]

Anthropic has been the only AI company actually caring about AI safety. Here’s a dated benchmark but it’s a trend Ive never seen disputed https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...

CuriouslyC 16 hours ago | parent | next [-]

Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.

wincy 15 hours ago | parent | next [-]

I asked ChatGPT about how shipping works at post offices and it gave a very detailed response, mentioning “gaylords” which was a term I’d never heard before, then it absolutely freaked out when I asked it to tell me more about them (apparently they’re heavy duty cardboard containers).

Then I said “I didn’t even bring it up ChatGPT, you did, just tell me what it is” and it said “okay, here’s information.” and gave a detailed response.

I guess I flagged some homophobia trigger or something?

ChatGPT absolutely WOULD NOT tell me how much plutonium I’d need to make a nice warm ever-flowing showerhead, though. Grok happily did, once I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead.

nandomrumber 12 hours ago | parent | next [-]

Wikipedia entry on the gaylord bulk box:

https://en.wikipedia.org/wiki/Bulk_box

ruszki 8 hours ago | parent | prev [-]

> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead

Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.

Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.

Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.

ryanjshaw 16 hours ago | parent | prev [-]

Claude was immediately willing to help me crack a TrueCrypt password on an old file I found. ChatGPT refused to because I could be a bad guy. It’s really dumb IMO.

BloondAndDoom 15 hours ago | parent | next [-]

ChatGPT refused to help me to disable windows defender permanently on my windows 11. It’s absurd at this point

nananana9 14 hours ago | parent [-]

It just knows it's a waste of effort.

shepherdjerred 15 hours ago | parent | prev [-]

Claude sometimes refuses to work with credentials because it’s insecure. e.g. when debugging auth in an app.

nradov 14 hours ago | parent | prev [-]

That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.

mocamoca 12 hours ago | parent [-]

Would you mind explaining your point a view? Or point me to ressources making you think so?

nradov 7 hours ago | parent [-]

What can be asserted without evidence can also be dismissed without evidence. The benchmark creators haven't demonstrated that higher scores result in fewer humans dying or any meaningful outcome like that. If the LLM outputs some naughty words that's not an actual safety problem.

LeoPanthera 16 hours ago | parent | prev | next [-]

This might also be why Gemini is generally considered to give better answers - except in the case of code.

Perhaps thinking about your guardrails all the time makes you think about the actual question less.

mh2266 16 hours ago | parent | next [-]

re: that, CC burning context window on this silly warning on every single file is rather frustrating: https://github.com/anthropics/claude-code/issues/12443

tempestn 15 hours ago | parent | next [-]

"It also spews garbage into the conversation stream then Claude talks about how it wasn't meant to talk about it, even though it's the one that brought it up."

This reminds me of someone else I hear about a lot these days.

nandomrumber 12 hours ago | parent [-]

Are you across Puppet Regime from GZERO Media?

https://youtu.be/aPSWJZ63V_I

frumplestlatz 13 hours ago | parent | prev | next [-]

It's frustrating just how terrible claude (the client-side code) is compared to the actual models they're shipping. Simple bugs go unfixed, poor design means the trivial CLI consumes enormous amounts of CPU, and you have goofy, pointless, token-wasting choices like this.

It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.

ahartmetz 12 hours ago | parent | next [-]

I think I read in another HN discussion that all of that code is written using Claude Code. Could be a strict dogfood diet to (try to) force themselves to improve their product. Which would be strangely principled (or stupid) in such a competitive market. Like a 3D printer company insisting on 3D-printing its 3D printers.

copperx 12 hours ago | parent [-]

It's not crazy if you know that your customers ARE buying your 3D printer to make other 3D printers.

Imustaskforhelp 12 hours ago | parent | prev [-]

> It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.

Well what they are doing is vibe coding 80% of the application instead.

To be honest, they don't want Claude code to be really good, they just want it good enough

Claude code & their subscription burns money from them. Its sort of an advertising/lock-in trick.

But I feel as if Anthropic made Claude code literally the best agent harness in the market, then even more would use it with their subscription which could burn a hole in their pocket maybe at a faster rate which can scare them when you consider all training costs and everything else too.

I feel as if they have to maintain a balance to not go bankrupt soon.

The fact of the matter is that Claude code is just a marketing expense/lock-in and in that case, its working as intended.

I would obviously suggest to not have any deep affection of claude code or waiting for its improvements. The AI market isn't sane in the engineering sense. It all boils down to weird financial gimmicks at this point trying to keep the bubble last a little longer, in my opinion.

xvector 14 hours ago | parent | prev [-]

the last comment about Claude thinking the anti-malware warning was a prompt injection itself, and reassuring the user that it would ignore the anti-malware warning and do what the user wanted regardless, cracked me up lmao

15 hours ago | parent | prev [-]
[deleted]
rahidz 8 hours ago | parent | prev | next [-]

Or Anthropic's models are intelligent/trained on enough misalignment papers, and are aware they're being tested.

bofadeez 15 hours ago | parent | prev [-]

Huh? https://alignment.anthropic.com/2026/hot-mess-of-ai/