I haven't tried this model. We have a corporate plan for several models, and they are liberal with our spend, because we're always in an arms race with our competitors in our industry. So any advantage we can get, we need to take, or we lose edge/market share.
We have access to anthropic models, openai models and google models.
I run all my sessions on their best models with max thinking, because I don't care to optimise token usage at this stage. We are still learning every day about how to optimise our workflows, but I will say that I don't typically experience what you're describing.
I have very opinionated AGENTS.md files at the repo level, and at various other levels in the repo where more specialised rules are needed but I don't want those in my context unless that specific section of the codebase is going to be used or touched. I make a lot of use of skills. And my sessions are almost all "spec driven" in the sense that I type out an opinionated requirement to the LLM, tell it to challenge my thinking, to push back, to iterate on its own thinking, then to formulate a plan, then once done, go over it again to find any issues. I will then review the plan, or wing it, depending on the task. I then look at the overall code structure and design it has done. I have strong, opinionated coding rules in my AGENTS file. I have strong testing requirements (mostly end-to-end, not unit style).
I get really good results from this. But, I will say we're working in a highly opinionated codebase. We have the fundamentals in place already, where there are rules for how you do everything. The agent follows those rules pretty well. I'm not sure how well it would work on a codebase that is messy with a lot of conflicting design principles.