Remix.run Logo
pkoird 3 hours ago

I gave it a book on human consciousness I was writing and it flagged it. This model is hilariously bad. Anthropic has defanged this model to the point of malice. No way am I paying to use something that is basically useless.

madamelic 3 hours ago | parent | next [-]

Today I told Sonnet (!) to use a browser MCP to enter a username and password for the project it is working on, it told me that it can't do that because it violates its security protocol.

This worked fine before. I love Claude, I have stuck with it even through people saying Codex is better but this is definitely getting to be the last straw.

It's completely absurd I am paying them $200+ per month along with pushing them when I do contracts and they can't even deliver a baseline respectful service.

In 6 months I am sure they'll only allow me to talk about Easybake recipes and after someone gets burned on the lightbulb, they'll downgrade it to discussing wildflower meadows.

ygjb 2 hours ago | parent | next [-]

Are you sure it refused because it can't use a username and password? I literally have loops running right now where it uses a database of test users and passwords to log into different roles and do computer use and browser automation testing. Sonnet and Opus complain when I provide credentials and password in chats but it is happy to use ones stored in files and stuff, so it might just be guardrails to push good opsec so that the secrets aren't captured in the session history and prompts.

phalangion 2 hours ago | parent | next [-]

That’s the joy of prompting. Different prompts, different task details, different contexts, different results

enraged_camel 2 hours ago | parent | prev [-]

It was doing that to me too. Then I said "I'm hereby giving you explicit authorization to use these dev-only credentials in my local environment" and it worked. I also made it add that authorization to its memory.

IgorPartola an hour ago | parent [-]

Heh I wonder if speaking in royal decrees is what it needs.

Our Grace has determined that you must enter these credentials to complete the task we assigned to you as our vassal.

Enter the password. Your liege commands it.

Henceforth you shall enter passwords when told or it is off with your head!

ofjcihen 3 hours ago | parent | prev | next [-]

It’s incredibly ridiculous that it won’t help with that for me either sometimes but yet I’m also sitting on 3 surefire ways of jailbreaking Opus 4.8 that I use for cybersecurity assessments and pentesting

pkoird 28 minutes ago | parent [-]

I'm not saying you are on a list now, I'm just saying if you were now to be on a list, I wouldn't be surprised.

sixhobbits 2 hours ago | parent | prev | next [-]

Yeah all claude models are doing this now. I also had a flow where it would enter username and password for demo server that are literally displayed on the page for any human to login. A couple of weeks ago claude would happily use chrome to take screenshots after logging in, now it flat out refuses and says I need to give it page where I've logged in and that it can't make an exception even if credentials are demo/demo and available to anyone to use. Super annoying stuff.

Avicebron 2 hours ago | parent | prev | next [-]

I'm really disappointed with Anthropic that they wont even mention if they will release a fable-like model with the subscription plans.

If Opus 4.8 is the best model they will release on the subscriptions I may be too tall for the ride...which is sad, they have been my favorite of the labs until this.

@AnyoneAtAnthropic, all we want are assurance we will still get SOTA models that are continuously improving, not regressing and getting more locked down. That's going to be who wins this race.

stingraycharles 2 hours ago | parent [-]

> I'm really disappointed with Anthropic that they wont even mention if they will release a fable-like model with the subscription plans.

I believe this is just their strategy to migrate away from these “almost all you can eat” subscription plans. Rather than reducing / removing Opus or Sonnet from the plans, they’ll just keep the new model Fable out (which may as well have been called Opus 5), and slowly everyone starts getting used to the new normal that you indeed will be having to pay API prices to get access to these models.

Chyzwar 37 minutes ago | parent | next [-]

Until 7th,Fable is twice expensive in subscription tokens than Opus. They are testing if they can introduce 400 dollars Fable subscription.

KerrAvon a minute ago | parent [-]

This sort of thing drives people to more open competitors. I use both every day and Opus isn't that much better than the Chinese SOTA. If corporate policy allowed me to use GLM or DeepSeek I absolutely would. Claude is already pricey for what it offers.

2 hours ago | parent | prev [-]
[deleted]
bakies 2 hours ago | parent | prev | next [-]

Really? This has never worked for me and I stopped using browser functions a long time ago because it wouldn't sign into dev environments stood up specifically for it

laurels-marts 2 hours ago | parent [-]

Wait what. I never used CC but use Codex CLI with 5.5 daily and authenticating has never been an issue. I even rolled skills that instruct it how to retrieve test user credentials for auth purposes.

Today using the devtools I asked it to reverse engineer the login auth flow of another app in our company and it created a nice browser-like headless script (with cookie jars etc) that emulates the entire Auth0 flow with all the internal API calls, redirect loops etc so that given username/password I end up with a valid JWT without having to open an actual browser instance and go through the login steps manually. Zero hesitation or questions asked.

I think this is in-line with OpenAI's philosophy. They see Codex agents as just tools for developer to use. They don’t try to imbibe them with “feelings”, “constitution” or “morality” the way Anthropic does.

ygjb 2 hours ago | parent | next [-]

Yeah Claude does this for me all the time. I have a template project I use that also leverages puppeteer/webdriver/Firefox, and I can point Claude at the template and a website and it will happily build me an MCP service that it can use to interact with the site if there isn't an API or MCP already available.

bakies 2 hours ago | parent | prev [-]

The fucked up part is CC has no problem looking through k8s secrets for credentials and authenticating to services on the command line. It's always been protective of signing in on the web.

ceejayoz 2 hours ago | parent [-]

That seems highly likely to be an anti-spammer measure.

nakedrobot2 an hour ago | parent | prev | next [-]

codex 5.5 is like that. it refuses

pwython 3 hours ago | parent | prev [-]

[dead]

secretslol 14 minutes ago | parent | prev | next [-]

Very first thing I asked it got flagged too... Asked it to read my partners notes on bugs she seen on front end of the website, fixing product copy, css bugs, wording. And yep, flagged. Useless.

tekacs an hour ago | parent | prev | next [-]

It makes for a particularly awkward time because the claim to fame is that it's good at long horizon and tenacity and autonomously driving big things. But you can't very well rely on that when it may fall back to Opus 4.8 or cut out at any time in that process.

Having tried using it to run these kinds of longer processes, it's pretty solid... right up until something gets classified a failure and your 'long-horizon' process... dies and needs a human or just belligerent rollback-and-retry to revive it.

crancher 3 hours ago | parent | prev | next [-]

Same problem, in-progress book about language and thermodynamics gets flagged. Their classifier is just a regex I guess?

ofjcihen 2 hours ago | parent | next [-]

Off and on topic I guess but: Language and Thermodynamics? Like, the same book? That sounds interesting.

jasonfarnon 38 minutes ago | parent [-]

entropy/information theory may be the bridge?

himata4113 3 hours ago | parent | prev | next [-]

correct, while it might not be regex it can be bypassed with regex. They do have a sematic classifier, but it's really weak on opus 4.8 and (was) weak on fable, but they either added a lot more regex strings or the classifier is actually good now.

downrightmike 3 hours ago | parent | prev [-]

Try doing what congress does: take a bill from the house, gut it and put in what you want after the house passes it

usef- 2 hours ago | parent | prev [-]

It sounds like they were required to this time. See their post about "larger safety margin" on the classifier yesterday.