| ▲ | spruce_tips 11 hours ago | |
no, i dont think it's a heavier lift to test different model families. my point was that swapping models, whether that's to different model families or to new versions in the same model family, isn't straightforward. i'm reluctant to both upgrade model versions AND to swap model families, and that in itself is a type of stickiness that multiple model providers have. maybe another way of saying the same thing is that there is still a lot of work to make eval tooling a lot better! | ||
| ▲ | DenisM 5 hours ago | parent [-] | |
Continuous eval is unavoidable even absent model changes. Agents are keeping memories, tools evolve over time, external data changes, new exploits are being deployed, partner agents do get upgraded. Theres too much entropy in the system. Context babysitting is our future. | ||