Remix.run Logo
Jcampuzano2 8 hours ago

I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort.

Only thing I can think of is for when someone is out of opus credits. Of course there are API billing use cases but I'd probably still just use opus on low.

itopaloglu83 8 hours ago | parent | next [-]

More and more I find myself trying to stop Opus from doing something stupid, and at every turn I need to tell it to stop overcomplicating things.

I think the models are being optimized for wealth extraction from users and companies, instead of solving problems.

I don't know why Opus would try to create an entire library when I told it specifically to do something simple that would take 2-3 lines of Python.

__natty__ 6 hours ago | parent | next [-]

> More and more I find myself trying to stop Opus from doing something stupid, and at every turn I need to tell it to stop overcomplicating things

Yeah, that’s my thoughts as well. I feel it’s great for benchmarks and some tasks while in other it tries to spend as much tokens as possible, tries to overcomplicate task and needs seconds or third round of steering that costs. With the scale Anthropic operates I bet it’s huge amount of extra money just to make sure their model works.

Aeolun 5 hours ago | parent [-]

It’s really weird when you go to one of the open models and suddenly the same context window stretches nearly 3-4 times as long.

indoordin0saur 3 hours ago | parent | prev | next [-]

Yeah. Mine really likes to read excess code. I'll ask it questions like "If I move all these three ETL jobs into a subfolder will it break anything?" It'll start with giving me the simple answer but then continue on to consider another question and realize it requires reading my entire other repo that handles all of my cloud's infrastructure. And it'll proceed to read through tens of thousands of lines of terraform.

post-it 8 hours ago | parent | prev | next [-]

> I don't know why Opus would try to create an entire library when I told it specifically to do something simple that would take 2-3 lines of Python.

Because it reasons in one direction. First it encounters some kind of issue with 2-3 lines of Python that might make it not work, and then it goes onto plan B, which is making a library, but it doesn't circle back and compare the effort of making the library to working around whatever might make the 2-3 lines not work. Except sometimes it does, because it's inscrutable.

MagicMoonlight 5 hours ago | parent | prev | next [-]

[dead]

3ffs 4 hours ago | parent | prev [-]

There were many of us who predicted and saw this months ago.

Should I refer to those who are only realising this now as stupid? I believe so.

Its not wealth extraction btw - the correct economic term is capturing/extracting surplus. They have a wide range of schemes - quality discrimination being one very obvious one.

Swear most of you on here pretend to be soooo smart when you def are not.

theptip an hour ago | parent | prev | next [-]

Are we reading the same chart? They have Sonnet <= high as Pareto dominant on $/perf.

You have to test each task obviously but it is not a bad model on its face.

nicce 8 hours ago | parent | prev | next [-]

Older Opus models will likely get deprecated and then over time this is the cheapest model. That is how prices are currently increased.

ChrisLTD 6 hours ago | parent | next [-]

Yeah... Sonnet becomes the new cheap model, and some Fable class model becomes the more expensive/better one.

theptip an hour ago | parent | prev [-]

Wat. Price/perf has been going down massively over the last few years.

c0m47053 5 hours ago | parent | prev | next [-]

Specific task based benchmarks don't reflect a lot of day to day agentic use cases in my experience. If you are working on a series of discrete tasks and can clear context after each one and move to the next, you might get that sort of efficiency from Opus low effort. I often find that when working through a real problem, iterating and discovering, context length can creep up, and that is where opus tends to get expensive.

phainopepla2 7 hours ago | parent | prev | next [-]

Looking at some of the agentic coding benchmarks on the system card[0], pages 117-118, it seems that running it at low outperforms Sonnet 4.6 at any level, and is a good deal cheaper as well. So on low it could be a good workhorse for an Opus-planned task.

[0] https://www.anthropic.com/claude-sonnet-5-system-card

enraged_camel 8 hours ago | parent | prev | next [-]

Speed is a huge reason. Sometimes you just need some simple tasks get done fast, and waiting 30-60 seconds for opus to even start thinking can really slow things down.

humanymous 8 hours ago | parent [-]

Opus with low reasoning effort would be faster than Sonnet with high reasoning. So that won't exactly help. I think it would just be what those models are optimized to perform

SirMaster 8 hours ago | parent | prev [-]

Maybe it's not for you? I don't pay, so I can't even use Opus... So this is an upgrade over Sonnet 4.6 for me.