Remix.run Logo
Davidzheng 2 hours ago

but degradation from servers being overloaded would be the type of degradation this SHOULD measure no? Unless it's only intended for measuring their quietly distilling models (which they claim not to do? idk for certain)

botacode an hour ago | parent | next [-]

Load just makes LLMs behave less deterministically and likely degrade. See: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

They don't have to be malicious operators in this case. It just happens.

bgirard 31 minutes ago | parent | next [-]

> malicious

It doesn't have to be malicious. If my workflow is to send a prompt once and hopefully accept the result, then degradation matters a lot. If degradation is causing me to silently get worse code output on some of my commits it matters to me.

I care about -expected- performance when picking which model to use, not optimal benchmark performance.

altcognito 24 minutes ago | parent | prev [-]

Explain this though. The code is deterministic, even if it relies on pseudo random number generation. It doesn't just happen, someone has to make a conscious decision to force a different code path (or model) if the system is loaded.

megabless123 2 hours ago | parent | prev | next [-]

noob question: why would increased demand result in decreased intelligence?

exitb an hour ago | parent | next [-]

An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.

codeflo an hour ago | parent | next [-]

This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.

TedDallas an hour ago | parent | next [-]

Per Anthropic’s RCA linked in Ops post for September 2025 issues:

“… To state it plainly: We never reduce model quality due to demand, time of day, or server load. …”

So according to Anthropic they are not tweaking quality setting due to demand.

rootnod3 an hour ago | parent | next [-]

And according to Google, they always delete data if requested.

And according to Meta, they always give you ALL the data they have on you when requested.

entropicdrifter 19 minutes ago | parent | next [-]

>And according to Google, they always delete data if requested.

However, the request form is on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard'.

groundzeros2015 13 minutes ago | parent | prev [-]

What would you like?

cmrdporcupine 34 minutes ago | parent | prev | next [-]

I guess I just don't know how to square that with my actual experiences then.

I've seen sporadic drops in reasoning skills that made me feel like it was January 2025, not 2026 ... inconsistent.

root_axis 3 minutes ago | parent [-]

I wouldn't doubt that these companies would deliberately degrade performance to manage load, but it's also true that humans are notoriously terrible at identifying random distributions, even with something as simple as a coin flip. It's very possible that what you view as degredation just "bad RNG".

cmrdporcupine a few seconds ago | parent [-]

yep stochastic fantastic

these things are by definition hard to reason about

17 minutes ago | parent | prev [-]
[deleted]
mcny an hour ago | parent | prev | next [-]

Personally, I'd rather get queued up on a long wait time I mean not ridiculously long but I am ok waiting five minutes to get correct it at least more correct responses.

Sure, I'll take a cup of coffee while I wait (:

lurking_swe an hour ago | parent [-]

i’d wait any amount of time lol.

at least i would KNOW it’s overloaded and i should use a different model, try again later, or just skip AI assistance for the task altogether.

direwolf20 an hour ago | parent | prev | next [-]

They don't advertise a certain quality. You take what they have or leave it.

denysvitali an hour ago | parent | prev | next [-]

If there's no way to check, then how can you claim it's fraud? :)

chrisjj an hour ago | parent | prev | next [-]

There is no level of quality advertised, as far as I can see.

bpavuk an hour ago | parent | prev | next [-]

> I think delivering lower quality than what was advertised and benchmarked is borderline fraud

welcome to the Silicon Valley, I guess. everything from Google Search to Uber is fraud. Uber is a classic example of this playbook, even.

copilot_king an hour ago | parent | prev [-]

If you aren't defrauding your customers you will be left behind in 2026

rootnod3 an hour ago | parent [-]

That number is a sliding window, isn't it?

sh3rl0ck 22 minutes ago | parent | prev [-]

I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.

awestroke an hour ago | parent | prev | next [-]

I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load

vidarh 2 hours ago | parent | prev | next [-]

It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.

chrisjj an hour ago | parent [-]

They advertise the Opus 4.5 model. Secretly substituting a cheaper one to save costs would be fraud.

kingstnap 31 minutes ago | parent [-]

Old school Gemini used to do this. It was super obvious because mid day the model would go from stupid to completely brain dead. I have a screenshot of Google's FAQ on my PC from 2024-09-13 that says this (I took it to post to discord):

> How do I know which model Gemini is using in its responses?

> We believe in using the right model for the right task. We use various models at hand for specific tasks based on what we think will provide the best experience.

Wheaties466 an hour ago | parent | prev [-]

from what I understand this can come from the batching of requests.

chrisjj an hour ago | parent [-]

So, a known bug?

cmrdporcupine 2 hours ago | parent | prev [-]

I've personally witnessed large variability in behaviour even within a given session -- which makes sense as there's nothing stopping Anthropic from shuttling your context/session around load balanced through many different servers, some of which might be quantized heavily to manage load and others not at all.

I don't know if they do this or not, but the nature of the API is such you could absolutely load balance this way. The context sent at each point is not I believe "sticky" to any server.

TLDR you could get a "stupid" response and then a "smart" response within a single session because of heterogeneous quantization / model behaviour in the cluster.

epolanski 2 hours ago | parent [-]

I've defended opus in the last weeks but the degradation is tangible. It feels like it degraded by a generation tbh.

cmrdporcupine an hour ago | parent [-]

it's just extremely variable