Relevant Pragmatic Engineer newsletter with many more cases along these lines, along with how some people are handling them: https://newsletter.pragmaticengineer.com/p/the-pulse-token-s...

Tokenmaxxing seems more and more like a way to encourage experimentation and learning, and incidents like this are a part of learning. Like, today devs simply use the most expensive model by default, even to do extremely simple things. This is obviously wasteful and costly, and budgets will soon be imposed, but this is how they're figuring out the economics.

For instance, like we estimate story points, we may estimate token budgets. At that point, why waste time and money invoking a model for a simple refactor when you could do it with a few keystrokes in an IDE? And why use a frontier model when an open-source local model could spit out that throwaway script? Local models can be tokenmaxxed, but frontier models will still be needed and will be used judiciously. Those are essentially trade-offs, and will eventually be empirically driven, which is what engineering is largely about.

So economics will soon push engineers back to do what they're paid to do: engineering. Just that it will look very different compared to what we're used to.

▲

great_psy 15 hours ago | parent [-]

This is the first time I heard about estimating tokens for a task.

I feel like you’re on to something. Management will pick this up, and make it part of the sprint planning.

Engineers will pull out their hair wondering how you can do that.

That’s like estimating how many CPU cycles a task will take. How many instructions will your laptop use while you work on something.

	▲	keeda 5 hours ago \| parent [-]
		Yeah, and I expect estimating token budgets is going to go the same trajectory (along with the same accompanying annoyances) as estimating and tracking story points! But done with the right mindset and proper awareness of the inherent uncertainty, you can sometimes achieve some reasonable estimates over time by starting with some T-shirt size estimates and then adapting based on actual numbers. Soon enough the team gets a sense of the nuances of the projects and its dependencies, and estimates get more accurate. As such, the example of estimating CPU cycles for tasks is actually relevant. For instance it is a common practice in real-time embedded systems running on tiny micro-controllers. But it is also possible to get good estimates for more complex applications / OS's / architectures simply by benchmarking them over time. The most common problem with planning and task estimation is that the corporate dynamics around it are not healthy: leadership often uses those as an SLA instead of the SWAG that they are. I worked on a team where our estimates never matched the actual time taken, partially due to rather unpredictable dependencies and high-priority tasks frequently interrupting us. But because we were clearly very high-functioning, management never held that against us. Those were some healthy corporate dynamics; not all places have that.