So Claude will reject 9 out of 10 prompts I give it and lecture me about safety, but somehow it was used for something genuinely malicious?

Someone make this make sense.

▲

goalieca 12 hours ago | parent | next [-]

LLMs are rather easy to convince. There’s no formal logic embedded in them that provably restricts outputs.

The less believable part for me is that people persist long enough and invest enough resources at prompting to do something with an automated agent that doesn’t have potential for massively backfire.

Secondly, they claimed to use Anthropic own infrastructure which is silly. There’s no doubt some capacity in China to do this. I also would expect incident response, threat detection teams, and other experts to be reporting this to Anthropic if Anthropic doesn’t detect it themselves first.

It sure makes good marketing to go out and claim such a thing though. This is exactly the kind of FOMO panic inducing headline that is driving the financing of whole LLM revolution.

	▲	apples_oranges 11 hours ago \| parent [-]
		there are llms which are modified to not reject anything at all, afaik this is possible with all llms. no need to convince. (granted you have to have direct access to the llm, unlike claude where you just have the frontend, but the point stands. no need to convince whatsoever.)

▲

comrade1234 12 hours ago | parent | prev | next [-]

Stop talking dirty with Claude.

▲

cbg0 12 hours ago | parent | prev | next [-]

I've never had a prompt rejected by Claude. What kind of prompts are you sending where "9 out of 10" get rejected?

	▲	neuroelectron 10 hours ago \| parent [-]
		Basic system administration tasks, creating scripts for automating log scanning, service configuration, etc. often it involves PII or payment.

▲

danielbln 12 hours ago | parent | prev [-]

I've rarely had Claude reject a prompt of mine. What are you prompting for to get a 90% refusal rate?