| ▲ | AquinasCoder 9 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
While I appreciate, they publish this information, it's increasingly hard to keep track of it all. I've lost the mental model of how different models at different effort levels perform and what tasks they are good at. In practice, I tend to just use the default on Claude Code that works well enough. But I wonder to what degree other users really play around with these settings to optimize for their project. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | m-dot-reviews 3 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been plugging this perhaps too many times now, but I am trying to bootstrap a user-sourced corpus of exactly "what model is good at task X". So, not benchmarks, but high-level tasks. There's a bit of a ordering problem in that nobody wants to bother commenting on a site that has few comments - so PTAL and contribute if you can. https://model.reviews | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | matheusmoreira 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I always use Opus 4.8 at max effort for everything. The $20 subscription didn't have enough tokens, but the $100 one had too many of them. So now I just max out Opus in order to maintain 100% weekly utilization. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nolok 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Same boat as you, and my answer is "... Except when I ask and overall or checkup task that is specifically heavy or overseeing in which case I use the maximum level" which lately meant ultracode. I'm not going to play around with thinking level every request because the goal is to make me save time not spend it in a different setting menu. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | brobdingnagians 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I tend to run it on High and then step it up for problems where I'm noticing it struggles, bump it back down after. Sometimes I accidentally leave a session in Ultracode for a day and wonder why things are taking so long, but generally happy with the results. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sanderjd 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What I want is a harness that knows how to optimize this kind of thing for me. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jimbo808 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It's really not that much. It's a bit hard to make sense of it not because it's hard to keep track of, but because they are being deceptive and opaque about what you're actually buying, and the thing you're paying for is different from one day to the next, as they fuck around with the parameters to boost subjective performance during a launch, then quietly degrade the service to cut costs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jbvlkt 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exactly this is my problem with all AI tools. I want someone else to create working tools for me so I can focus on my product. It is the same with other tools. I do not want to spent huge amounts of energy and time to setup my IDE, operating system or desk layout. I guess it is too early to have that now. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tash_2s 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I also ended up using max effort/reasoning for both coding and general chat. They don't spend too much extra time on simple tasks these days. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | throwaway219450 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Same advice as ever? We call it context engineering now, but prompt engineering still matters a lot. Most of the failures I run into are unspecified assumptions made by the model that derails the conversation, but usually updating the first prompt fixes it. Opus in my experience is a bit better about checking assumptions, while Sonnet will plow on ahead. An example is mentioning a file that doesn't exist: Sonnet will go ahead and try to grep your entire hard drive for it. Opus will say it's not local and request the path. I trust neither for general knowledge and I still find Opus giving me answers that are completely BS. But the token spend for Q&A is nothing compared to coding, so I always use Opus + a lot of thinking. For coding, I find Opus to be better value/token but I haven't done any sort of rigorous test. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | deadbabe 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There are token optimization consultants that can help organizations find the right balance of models for their employees to minimize costs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | j45 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Just because it’s hard to keep track of doesn’t mean it’s not relevant. Playing around with learning the differences is incredibly helpful to schedule on ones calendar weekly for an hour or two, while saving links throughout the week to try out. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | paulddraper 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It's almost like you want an automatically intelligent choice of your artificial intelligence. Understandable frankly. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jacooper 9 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Just use deepswe as a reference point. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||