interesting data. I use Claude Code daily and noticed 4.7 feels different but couldnt put numbers to it like this.

does your one-shot rate account for how much context you give it? I keep a detailed CLAUDE.md with project conventions and wondering if that closes the gap at all or if 4.7 just struggles regardless.

the fewer tools per turn thing worries me. Are you seeing it hallucinate project structure more? In my sessions it seems to want to figure things out in its head instead of actually reading the files

More expensive and lower first-try accuracy is rough. You planning to stick with 4.7 or going back?

▲

alwillis a day ago | parent | next [-]

Anthropic provides details regarding between Opus 4.7 and 4.6, including Opus 4.7 doesn't call tools as frequently as 4.6 due to being more capable. Depending on the task at hand, that could a good thing or not so good [1].

For example, regarding instruction following:

Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make.

[1]: https://platform.claude.com/docs/en/build-with-claude/prompt...

	▲	alegd a day ago \| parent [-]
		That explains a lot actually. So the fewer tool calls its by design. Makes sense but for coding specifically I'd rather it read my files than guess whats in them.

▲

agentseal 18 hours ago | parent | prev | next [-]

The one-shot rate doesn't factor in context size directly, it just tracks whether an edit succeeded without retries. That said, a detailed CLAUDE.md probably helps both models equally since the context is the same either way. Would be interesting to isolate that though.

I have started to rollback to 4.6 for some important task as I was working with it from longtime but I am still using 4.7 for some fresh task.

▲

agentseal 18 hours ago | parent | prev [-]

On the fewer tools per turn, yeah I think that lines up with what the other reply mentioned about 4.7 being more "in its head." I have not specifically tracked hallucinated project structure but the higher retry rate suggests it is getting things wrong more often when it skips the read step