Remix.run Logo
maldev 4 hours ago

It won't even review a cyber security blog post I wrote. Absolutely worthless and pitiful guardrails.

InsideOutSanta 4 hours ago | parent | next [-]

I'm having it review a project Opus 4.8 created. No security review, just "look for general issues, performance problems, missing features, etc." It spawned about twenty background tasks. It's still going, but so far, one has completed, and four have failed with guardrail messages. Nothing special, just stuff like reviewing the API:

Fable 5's safeguards flagged this message (https://www.anthropic.com/legal/aup). They may flag safe, normal content as well. These measures let us bring you Mythos-level capabilities sooner, and we're working to refine them. Claude Code can't respond to this request with Fable 5.

Try rephrasing the request in a new session or change your model.

This is incredibly stupid, particularly because I didn't write the request in the first place. Fable wrote it when it spawned the background task. How am I supposed to rephrase it?

Fable probably told itself to do a security review, and then failed itself for trying to do a security review, and now it's telling me not to tell it to do a security review.

ritzaco 4 hours ago | parent | next [-]

yeah I'm also getting this for standard dev work, anything with kubernetes etc

completely nerfs the model because you can't let it do stuff over a few hours unattended because 90% it's going to switch to opus in first 10 minutes anyway

so seems best thing now is to have it write plans and then default to using opus for work anyway?

metadata 3 hours ago | parent [-]

It is nerfed even just for plans. It switches to Opus in the first few minutes of me trying to build a plan to extract a component out of my larger codebase.

Trying to minimize privileged access codebase and was careful not to mention security explicitly.

Melatonic 43 minutes ago | parent | prev | next [-]

Do you have to pay for the tokens used for the safeguard flagged stuff?

vanchor3 2 hours ago | parent | prev | next [-]

I once had Fable flag on one of the three-word session names that Cowork auto generated at the beginning.

unshavedyak 3 hours ago | parent | prev | next [-]

It's honestly kinda interesting. Now we're at a point where SOTA model companies aren't the ones who release the best tech, but who release the best and actually usable tech.

A worse product could win right now if it simply does as its asked.

trunnell 4 hours ago | parent | prev | next [-]

Blame Amazon and the White House

ljlolel 3 hours ago | parent [-]

Nah it was refusing plenty of stuff before the white house

InsideOutSanta 3 hours ago | parent [-]

It did, but it's a lot worse now. A fricken lot.

ignoramous 4 hours ago | parent | prev [-]

> Fable wrote it when it spawned the background task. How am I supposed to rephrase it?

Can the harness to auto-rephrase? I imagine, doing so will burn through tokens though.

aliasxneo 4 hours ago | parent | next [-]

> I imagine, doing so will burn through tokens though.

What a surprisingly beneficial consequence for Anthropic.

qurren 4 hours ago | parent | prev [-]

Maybe set up Codex to rephrase stuff and remote control the Claude Code terminal?

ctoth 4 hours ago | parent | prev | next [-]

Cybersecurity? It won't even help me work on my speech synthesizer[0]!

I guess? If you squint? DSP code could look a little like AI training code? ... Er. No. Not really I'm pretty lost on this one.

The task was literally just to compare against the "make a beautiful voice" plan, see what we've implemented, see what's left to do, and to make recommendations for low-hanging fruit, anything we've done wrong so far? (aaaaand ... downgrade. At least it wasn't silent.

[0]: https://github.com/ctoth/qlatt

4 hours ago | parent | prev | next [-]
[deleted]
ljlolel 3 hours ago | parent | prev [-]

That's why we made OpenPatcher which uses open source models to give you consistent review of code to fix them: https://x.com/ryaneshea/status/2072332311971197077