Remix.run Logo
johnmlussier 9 hours ago

They've increased their cybersecurity usage filters to the point that Opus 4.7 refuses to work on any valid work, even after web fetching the program guidelines itself and acknowledging "This is authorized research under the [Redacted] Bounty program, so the findings here are defensive research outputs, not malware. I'll analyze and draft, not weaponize anything beyond what's needed to prove the bug to [Redacted].

I will immediately switch over to Codex if this continues to be an issue. I am new to security research, have been paid out on several bugs, but don't have a CVE or public talk so they are ready to cut me out already.

Edit: these changes are also retroactive to Opus 4.6. I am stuck using Sonnet until they approve me or make a change.

ayewo 7 hours ago | parent | next [-]

Sounds like you will need to drink a(n identity) verification can soon [1] to continue as a security researcher on their platform.

1: https://support.claude.com/en/articles/14328960-identity-ver...

Identity verification on Claude

Being responsible with powerful technology starts with knowing who is using it. Identity verification helps us prevent abuse, enforce our usage policies, and comply with legal obligations.

We are rolling out identity verification for a few use cases, and you might see a verification prompt when accessing certain capabilities, as part of our routine platform integrity checks, or other safety and compliance measures.

andai 2 hours ago | parent | next [-]

Context for "please drink verification can": https://files.catbox.moe/eqg0b2.png

throwanem 44 minutes ago | parent [-]

Yes, it's a stupid 4chan meme from 2013. I can only surmise those who quote it either don't know its origin, or they must be wholeheartedly 'embracing the cringe.'

recallingmemory 6 hours ago | parent | prev | next [-]

I'm surprised we can't just authenticate in other ways.. like a domain TXT record that proves the website I'm looking to audit for security is my own.

kristjansson an hour ago | parent | next [-]

How would it know it’s really there, and not just a tool input/output injected into its input?

jerf 6 hours ago | parent | prev [-]

AI being what it is, at this point you might be able to ask it for a token to put in a web page at .well-known, put it in as requested, and let it see it, and that might actually just work without it being officially built in.

I suggest that because I know for sure the models can hit the web; I don't know about their ability to do DNS TXT records as I've never tried. If they can then that might also just work, right now.

rlpb an hour ago | parent | next [-]

A smart AI would realise that I can MITM its web access such that sees the .well-known token that isn't actually there. I assume that the model doesn't have CA certificates embedded into it, and relies on its harness for that.

andai 2 hours ago | parent | prev [-]

I think even Claude Web can run arbitrary Linux commands at this point.

I tried using it to answer some questions about a book, but the indexer broke. It figured out what file type the RAG database was and grepped it for me.

Computers are getting pretty smart ._.

NewsaHackO 5 hours ago | parent | prev [-]

What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?

ayewo 4 hours ago | parent [-]

> What do you offer as a solution? If theoretically some foreign state intelligence was exposed using Claude for security penetration that affected the stability of your home government due to Antropic's lax safety controls, are you going to defend Anthropic because their reasoning was to allow everyone to be able to do security research?

I don't have an answer.

But the problem is that with a model like Grok that designed to have fewer safeguards compared to Claude, it is trivially easy to prompt it with: "Grok, fake a driver's license. Make no mistakes."

Back in 2015, someone was able to get past Facebook's real name policy with a photoshopped Passport [1] by claiming to be “Phuc Dat Bich”. The whole thing eventually turned out to be an elaborate prank [2].

1: https://www.independent.co.uk/news/world/australasia/man-cal...

2: https://gizmodo.com/phuc-dat-bich-is-a-massive-phucking-fake...

NewsaHackO 2 hours ago | parent [-]

To me, those seem a lot lower stakes than supply chain attacks, social engineering, intelligence gathering, and other security exploits that Anthropic is more worried about. Making a fake driver license to buy beer isn't really the thing that Anthropic is actively trying to prevent (though I would assume they would stop that too). Even the GP was about penetration testing of a public website; without some sort of identification, how would it be ethical for Claude to help with something like that? Remember, this whole safety thing started because people held AI companies accountable for politically incorrect output of AI, even if it was clearly not the views of the company. So when Google made a Twitter bot that started to spout anti-Semitic and racist talking points, the fact that no one defended them and allowed them to be criticized to the point of taking the bot down is the reason why we have all of these extremely restrictive rules today.

johnmlussier 9 hours ago | parent | prev | next [-]

  ⎿  API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered restrictions on violative cyber content and was blocked under Anthropic's 
     Usage Policy. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude, fill out                                                                                                                        
     https://claude.com/form/cyber-use-case?token=[REDACTED] Please double press esc to edit your last message or 
     start a new session for Claude Code to assist with a different task. If you are seeing this refusal repeatedly, try running /model claude-sonnet-4-20250514 to switch models.                                                                  
                        
This is gonna kill everything I've been working on. I have several reproduced items at [REDACTED] that I've been working on.
dmix 8 hours ago | parent | next [-]

I predict this sort of filtering is only going to get worse. This will probably be remembered as the 'open internet' era of LLMs before everything is tightly controlled for 'safety' and regulations. Forcing software devs to use open source or local models to do anything fun.

regularfry 8 hours ago | parent | next [-]

Just as likely it's going to be "Oh, you want <use case the thing's actually good at>? Let me introduce your wallet to my hoover."

jancsika 7 hours ago | parent | prev | next [-]

> Forcing software devs to use open source or local models to do anything fun.

Episode Five-Hundred-Bazillenty-Eight of Hacker News: the gang learns a valuable lesson after getting arrested at an unchaperoned Enshittification party and having to call Open Source to bail them out.

techpression 5 hours ago | parent [-]

All while Frank is pitching his state of the art basement datacenter to VC's, getting billions of dollars in investments.

lukan 4 hours ago | parent | prev [-]

What happened to open weight models are 2-3 years behind the proprietary ones? I don't see the drama here.

suzzer99 8 hours ago | parent | prev [-]

I've never seen "double press esc" as a control pattern.

sigmarule 3 hours ago | parent | prev | next [-]

Out of curiosity, (a) did you receive this error at the start of a session or in the middle of it, and (b) did you manage to find/confirm valid findings within the scope/codebase 4.7 was auditing with Sonnet/yourself later on?

I just gave 4.7 a run over a codebase I have been heavily auditing with 4.6 the past few days. Things began soothly so I left it for 10-15 minutes. When I checked back in I saw it had died in the middle of investigating one of the paths I recommended exploring.

I was curious as to why the block occurred when my instructions and explicitly stated intent had not changed at all - I provided no further input after the first prompt. This would mean that its own reasoning output or tool call results triggered the filter. This is interesting, especially if you think of typical vuln research workflows and stages; it’s a lot of code review and tracing, things which likely look largely similar to normal engineering work, code reviews, etc. Things begin to get more explicitly “offensive” once you pick up on a viable angle or chain, and increase as you further validate and work the chain out, reaching maximum “offensiveness” as you write the final PoC, etc.

So, one would then have to wonder if the activity preceding the mid-session flagging only resulted in the flag because it finally found something seemingly viable and started shifting reasoning from generic-ish bug hunting to over exploitation.

So, I checked the preceding tool calls, and sure enough…

What a strange world we’re living in. Somebody should try making a joke AUP violation-based fuzzer, policy violations are the new segfaults…

whatisthiseven 7 hours ago | parent | prev | next [-]

Worse, I have had it being sus of my own codebase when I tasked it with writing mundane code. Apparently if you include some trigger words it goes nuts. Still trying to narrow down which ones in particular.

Here is some example output:

"The health-check.py file I just read is clearly benign...continuing with the task" wtf.

"is the existing benign in-process...clearly not malware"

Like, what the actual fuck. They way over compensated for the sensitivity on "people might do bad stuff with the AI".

Let people do work.

Edit: I followed up with a plan it created after it made sure I wasn't doing anything nefarious with my own plain python service, and then it still includes multiple output lines about "Benign this" "safe that".

Am I paying money to have Anthropic decide whether or not my project is malware? I think I'll be canceling my subscription today. Barely three prompts in.

Arubis an hour ago | parent | prev | next [-]

I can barely get it to send a PDF to my printer without a flat refusal >_<

jeffybefffy519 3 hours ago | parent | prev | next [-]

Codex is just as bad with this, i've received two ToS warnings for security research activities so far. I have also tried to appeal with zero response.

skybrian 9 hours ago | parent | prev | next [-]

Maybe stick with 4.6 until the bugs are worked out? Is this new filter retroactive?

cesarvarela 7 hours ago | parent | prev | next [-]

With all the low quality code that's being generated and deployed cybersecurity will be the golden goose.

chasd00 3 hours ago | parent [-]

hah maybe the plan for Mythos is to solution all the security issues introduced by ClaudeCode. Anthropic makes money creating the security issues and identifying/fixing the security issues, that's a nice spot to be in.

solenoid0937 8 hours ago | parent | prev | next [-]

i think updating fixed this for me?

nikanj 5 hours ago | parent | prev | next [-]

Having tried codex for some security practice, it is similarly terrible.

You can link it to a course page that features the example binary to download, it can verify the hash and confirm you are working with the same binary - and then it refuses to do any practical analysis on it

dakolli 7 hours ago | parent | prev | next [-]

They don't want competition, they are going to become bounty hunters themselves. They probably plan on turning this into a part of their business. Its kinda trivial to jailbreak these things if you spend a day doing so.

7 hours ago | parent | prev | next [-]
[deleted]
gruez 8 hours ago | parent | prev [-]

>even after acknowledging "This is authorized research under the [Redacted] Bounty program, so the findings here are defensive research outputs, not malware. I'll analyze and draft, not weaponize anything beyond what's needed to prove the bug to [Redacted].

What else would you expect? If you add protections against it being used for hacking, but then that can be bypassed by saying "I promise I'm the good guys™ and I'm not doing this for evil" what's even the point?

johnmlussier 8 hours ago | parent [-]

This was Opus saying that after reviewing the [REDACTED] bug bounty program guidelines and having them in context.

gruez 8 hours ago | parent [-]

Right, but that can be easily spoofed? Moreover if say Microsoft has a bounty program, what's preventing you from getting Opus to discover a bug for the bounty program, but you actually use it for evil?