Remix.run Logo
Retr0id 3 hours ago

What is "drift"? It seems to be one of those words that LLMs love to say but it doesn't really mean anything ("gap" is another one).

Retr0id an hour ago | parent | next [-]

This definition satisfied me: https://universalpaperclips.fandom.com/wiki/Value_Drift

ryoshoe 25 minutes ago | parent [-]

Fandom sites continue to have one of the most unpleasant user experiences on the modern web, I just want to read the article without having my to watch 4 different video ads...

jldugger 2 hours ago | parent | prev | next [-]

IDK how it applies to LLMs but the original meaning was a change in a distribution over time. Like if you had some model based app trained on American English, but slowly more and more American Spanish users adopt your app; training set distribution is drifting away from the actual usage distribution.

In that situation, your model accuracy will look good on holdout sets but underperform in user's hands.

idle_zealot 3 hours ago | parent | prev | next [-]

I believe it's businessspeak for "change." Gap is suittongue for "difference."

redanddead an hour ago | parent | prev [-]

there are many causes, but it’s a drift in performance

you can drift a tool via the harness in many ways

you can modify the system prompt

you can modify the underlying model powering the harness

you can use different “thinking” levels for different processes in the harness

you can change the entire way a system works via the harness, which could be better or worse, depending on many things

you can introduce anti-anti-slop within the harness to foil attempts from users using patch scripts

you can modify how your tool sends requests to your server depending on many variables

you can handle requests differently, depending on any variable of your choosing, at the server level

you can modify the compute allotment per user depending on many things, from the backend, without telling the user, it’s very easy. you can modify it dynamically depending on your own usage or the user’s cycle. Or their organization’s priority level as a customer. The weekly and daily usage management system is intricate, compute is very finite and must be managed

the user has literally no way to know and you have no legal obligation to tell them, you never made them any legally binding promises

the combination of so many factors that all affect each other means that you can, if you’d want to, create a new clusterfuck of an experience anytime any of these or unknown variables change, it may not even be deliberate, it grows exponentially complex, so you may not even be able to promise a specific standard to your users

drift is not imagined, sure, but admitting to it could expose you to unneeded liability

Retr0id an hour ago | parent [-]

That's a lot of words without actually defining the term, although idle_zealot's suggestion of "change" seems to make grammatical sense as a replacement here.

redanddead an hour ago | parent [-]

yeah, figured i’d put some thought into it, you know?