| ▲ | altcunn 3 hours ago | |
This is the underrated risk that nobody talks about enough. We've already seen it play out with the Codex deprecation, the GPT-4 behavior drift saga, and every time Anthropic bumps a model version. The practical workaround most teams land on is treating the model as a swappable component behind a thick abstraction layer. Pin to a specific model version, run evals on every new release, and only upgrade when your test suite passes. But that's expensive engineering overhead that shouldn't be necessary. What's missing is something like semantic versioning for model behavior. If a provider could guarantee "this model will produce outputs within X similarity threshold of the previous version for your use case," you could actually build with confidence. Instead we get "we improved the model" and your carefully tuned prompts break in ways you discover from user complaints three days later. | ||