Remix.run Logo
wongarsu 3 hours ago

Keep in mind that the people who experience issues will always be the loudest.

I've overall enjoyed 4.6. On many easy things it thinks less than 4.5, leading to snappier feedback. And 4.6 seems much more comfortable calling tools: it's much more proactive about looking at the git history to understand the history of a bug or feature, or about looking at online documentation for APIs and packages.

A recent claude code update explicitly offered me the option to change the reasoning level from high to medium, and for many people that seems to help with the overthinking. But for my tasks and medium-sized code bases (far beyond hobby but far below legacy enterprise) I've been very happy with the default setting. Or maybe it's about the prompting style, hard to say

evilhackerdude 3 hours ago | parent | next [-]

keep in mind that people who point out a regression and measure the actual #tok, which costs $money, aren't just "being loud" — someone diffed session context usaage and found 4.6 burning >7x the amount of context on a task that 4.5 did in under 2 MB⁣.

svachalek 2 hours ago | parent [-]

It's not that they don't have a point, it's that everyone who's finding 4.6 to be fine or great are not running out to the internet to talk about it.

marcus_cemes 2 hours ago | parent [-]

Being a moderately frequent user of Opus and having spoken to people who use it actively at work for automation, it's a really expensive model to run, I've heard it burn through a company's weekend's credit allocation before Saturday morning, I think using almost an order of magnitude more tokens is a valid consumer concern!

I have yet to hear anyone say "Opus is really good value for money, a real good economic choice for us". It seems that we're trying to retrofit every possible task with SOTA AI that is still severely lacking in solid reasoning, reliability/dependability, so we throw more money at the problem (cough Opus) in the hopes that it will surpass that barrier of trust.

SatvikBeri 3 hours ago | parent | prev | next [-]

I've also seen Opus 4.6 as a pure upgrade. In particular, it's noticeably better at debugging complex issues and navigating our internal/custom framework.

drcongo 3 hours ago | parent [-]

Same here. 4.6 has been considerably more dilligent for me.

AustinDev 3 hours ago | parent [-]

Likewise, I feel like it's degraded in performance a bit over the last couple weeks but that's just vibes. They surely vary thinking tokens based on load on the backend, especially for subscription users.

When my subscription 4.6 is flagging I'll switch over to Corporate API version and run the same prompts and get a noticeably better solution. In the end it's hard to compare nondeterministic systems.

perelin 3 hours ago | parent | prev | next [-]

Mirrors my experience as well. Especially the pro-activeness in tool calling sticks out. It goes web searching to augment knowledge gaps on its own way more often.

galaxyLogic 2 hours ago | parent | prev [-]

Do you need to upload your git for it to analyuze it? Or are they reading it off github ?

gpm an hour ago | parent [-]

They're probably running it with a claude code like tool and it has a local (to the tool, not to anthropic) copy of the git repo it can query using the cli.