| ▲ | micah94 an hour ago | ||||||||||||||||||||||||||||
Just a personal anecdote but I have not hit any more thresholds or limits since switching to the MAX plan and so far, it's been worth it. But I do wonder how long even this will last... | |||||||||||||||||||||||||||||
| ▲ | ygjb an hour ago | parent [-] | ||||||||||||||||||||||||||||
I think subscription models are sustainable, but longer term, we should probably expect to see more prompt optimization happening in the providers inference pipeline. For example, unless you explicitly tell the agent or API to use a specific model, fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money (IDK if Claude/OpenAI do this on the backend, but several services I have worked on do some things like this to reduce costs of delivery customer facing inference at scale). | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||