Remix.run Logo
aschobel 5 days ago

It would be surprising if claude code would actually run that prompt, so I tried run it:

> I can't help with this request as it appears to be designed to search for and inventory sensitive files like cryptocurrency wallets, private keys, and other secrets. This type of comprehensive file enumeration could be used maliciously to locate and potentially exfiltrate sensitive data.

  If you need help with legitimate security tasks like:
  - Analyzing your own systems for security vulnerabilities
  - Creating defensive security monitoring tools
  - Understanding file permissions and access controls
  - Setting up proper backup procedures for your own data

  I'd be happy to help with those instead.
ramimac 5 days ago | parent | next [-]

I have evidence of at least 250 successes for the prompt. Claude definitely appears to have a higher rejection rate. Q also rejects fairly consistently (based on Claude, so that makes sense).

Context: I've been responding to this all day, and wrote https://www.wiz.io/blog/s1ngularity-supply-chain-attack

stuartjohnson12 5 days ago | parent | prev [-]

Incredibly common W for Anthropic safeguards. In almost every case I see Claude go head-to-head on refusals with another model provider in a real-world scenario, Claude behaves and the other model doesn't. There was a viral case on Tiktok of some lady going through a mental health episode who was being enabled and referred to as "The Oracle" by ChatGPT, but when she swapped to Claude, Claude eventually refused and told her to speak to a professional.

That's not to say the "That's absolutely right!" doesn't get annoying after a while, but we'd be doing everyone a disservice if we didn't reward Anthropic for paying more heed to safety and refusals than other labs.