Remix.run Logo
hrmtst93837 6 hours ago

The tough part is that the "core team" can't see inside most model updates so even if you have great tests, judgment calls by the model can change silently and break contracts you didn't even know you had. Traditional monitoring can catch obvious failures but subtle regressions or drift in LLM outputs need their own whole bag of tricks. If you treat LLM integration like any other code lib you'll be chasing ghosts every time the upstream swaps a training data set or tweaks a prompt template.

_pdp_ 5 hours ago | parent [-]

This is no different than receiving PRs from anonymous users on the Internet. Some more successful open source projects are already doing this at scale.