Remix.run Logo
paulatreides 4 hours ago

it triggered for my.... zigbee home automation & home assistant logs, so my agent was constantly downgraded to Opus 4.8 even after I've changed it back. The false positives never stopped. "Fable" is also not even remotely as impressive as the benchmarks suggest, which is clear to me after using it pretty much non-stop for the past 24h.

lambda an hour ago | parent | next [-]

I suspect it's even more expensive to run than they are charging for. These safeguards are just an excuse to get people to use it less, because it's not actually sustainable to use. They want to tempt people to consider them the leader, and it may actually be somewhat stronger, but too expensive to actually use at scale, so they nerf it by downgrading you constantly.

reactordev 4 hours ago | parent | prev | next [-]

This, Fable is exactly that, a Fable

an hour ago | parent | prev | next [-]
[deleted]
fluidcruft 3 hours ago | parent | prev | next [-]

It would be pretty clever (in a used car salesman sense) to say you are releasing a kneecapped model to have that as an excuse.

DrewADesign 3 hours ago | parent [-]

Being (probably overly) cynical about their recent bout of safety handwringing, I think they’ve a) increased the hype as much as humanly possible about their incremental improvements sprinkled with the occasional regression, b) know they soon will have to multiply their prices several times when the VC subsidies dry up, and c) will probably still need to partially close the faucet on compute. They’re priming us for a heroic explanation why their service (not necessarily models — service) is simultaneously becoming a lot more expensive AND shittier. “We’ve largely failed to deliver on 5 years of promises that this will reduce knowledge work labor costs dramatically after wasting hundreds of billions of dollars… sorry” is a death knell. However, “We’ve decided to not deliver on 5 years of promises after wasting billions of dollars… for safety… but keep those investments rolling in” is like crack to the true believers.

kraakf06 an hour ago | parent | prev | next [-]

False positives like this are probably more damaging than the guardrails themselves. If engineers can't predict when a model will switch behavior, it becomes difficult to trust it in production workflows.

NewsaHackO 3 hours ago | parent | prev [-]

It has to be sort of impressive, given that you tried so hard to use it instead of the regular Opus.

paulatreides 3 hours ago | parent | next [-]

Some people made grandiose claims about its capabilities and I wanted to experience it myself.

anigbrowl 2 hours ago | parent [-]

OK, but for almost 24h straight? That seems a little obsessive, and not in the good way.

borski an hour ago | parent [-]

Getting excited about the announcement of new capabilities is very normal.

People used to wait in line all night to buy an iPhone. This isn’t that different.

californical 3 hours ago | parent | prev | next [-]

I’ve also been trying to use it a lot due to all of the hype, but when I compared it side-by-side on a specific problem against Opus, I think that the solution Opus came to was cleaner and more accurate, although also more verbose.

Small sample size, but if Mythos/Fable was that much better, I feel like it should’ve given me an obviously better answer than Opus.

punchmesan 3 hours ago | parent | prev | next [-]

Considering that this is a brand new release of a frontier model that Anthropic is hyping hard, I'm not sure that the conclusion to draw from their repeated attempts to use it is that it's impressive... Anthropic is promising that it's impressive and we're all trying to test it out.

I, for one, have tried using it several times today and the guardrails kept switching the model back to Opus, so I have no clue if it's impressive or not.

flyingcircus3 3 hours ago | parent | prev [-]

It isn't reasonable to infer that OP was claiming to have universally been unimpressed about every facet of Fable, and now some unrelated impressiveness is the evidence of their false claims.