| ▲ | spruce_tips 12 hours ago |
| I agree there is no moat to the mechanics of switching models i.e. what openrouter does. But it's not as straightforward as everyone says to switch out the model powering a workflow that's been tuned around said model, whether that tuning was purposeful or accidental. It takes time to re-evaluate that new model works the same or better than old model. That said, I don't believe oai's models consistently produce the best results. |
|
| ▲ | raw_anon_1111 12 hours ago | parent [-] |
| You need a way to test model changes regardless as models in the same family change. Is it really a heavier lift to test different model families than it is to test going from GPT 3.5 to GPT 5 or even as you modify your prompts? |
| |
| ▲ | spruce_tips 11 hours ago | parent [-] | | no, i dont think it's a heavier lift to test different model families. my point was that swapping models, whether that's to different model families or to new versions in the same model family, isn't straightforward. i'm reluctant to both upgrade model versions AND to swap model families, and that in itself is a type of stickiness that multiple model providers have. maybe another way of saying the same thing is that there is still a lot of work to make eval tooling a lot better! | | |
| ▲ | DenisM 5 hours ago | parent [-] | | Continuous eval is unavoidable even absent model changes. Agents are keeping memories, tools evolve over time, external data changes, new exploits are being deployed, partner agents do get upgraded. Theres too much entropy in the system. Context babysitting is our future. |
|
|