Remix.run Logo
megabless123 3 hours ago

noob question: why would increased demand result in decreased intelligence?

exitb 3 hours ago | parent | next [-]

An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.

codeflo 3 hours ago | parent | next [-]

This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.

TedDallas 2 hours ago | parent | next [-]

Per Anthropic’s RCA linked in Ops post for September 2025 issues:

“… To state it plainly: We never reduce model quality due to demand, time of day, or server load. …”

So according to Anthropic they are not tweaking quality setting due to demand.

rootnod3 2 hours ago | parent | next [-]

And according to Google, they always delete data if requested.

And according to Meta, they always give you ALL the data they have on you when requested.

entropicdrifter 2 hours ago | parent | next [-]

>And according to Google, they always delete data if requested.

However, the request form is on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard'.

groundzeros2015 2 hours ago | parent | prev [-]

What would you like?

AlexandrB an hour ago | parent [-]

An SLA-style contractually binding agreement.

chrisjj an hour ago | parent | prev | next [-]

That's about model quality. Nothing about output quality.

stefan_ 35 minutes ago | parent | prev | next [-]

Thats what is called an "overly specific denial". It sounds more palatable if you say "we deployed a newly quantized model of Opus and here are cherry picked benchmarks to show its the same", and even that they don't announce publicly.

cmrdporcupine 2 hours ago | parent | prev | next [-]

I guess I just don't know how to square that with my actual experiences then.

I've seen sporadic drops in reasoning skills that made me feel like it was January 2025, not 2026 ... inconsistent.

quadrature an hour ago | parent | next [-]

LLMs sample the next token from a conditional probability distribution, the hope is that dumb sequences are less probable but they will just happen naturally.

tempaccount420 4 minutes ago | parent [-]

It's more like the choice between "the" and "a" than "yes" and "no".

root_axis an hour ago | parent | prev [-]

I wouldn't doubt that these companies would deliberately degrade performance to manage load, but it's also true that humans are notoriously terrible at identifying random distributions, even with something as simple as a coin flip. It's very possible that what you view as degradation is just "bad RNG".

cmrdporcupine an hour ago | parent [-]

yep stochastic fantastic

these things are by definition hard to reason about

2 hours ago | parent | prev [-]
[deleted]
direwolf20 2 hours ago | parent | prev | next [-]

They don't advertise a certain quality. You take what they have or leave it.

mcny 2 hours ago | parent | prev | next [-]

Personally, I'd rather get queued up on a long wait time I mean not ridiculously long but I am ok waiting five minutes to get correct it at least more correct responses.

Sure, I'll take a cup of coffee while I wait (:

lurking_swe 2 hours ago | parent [-]

i’d wait any amount of time lol.

at least i would KNOW it’s overloaded and i should use a different model, try again later, or just skip AI assistance for the task altogether.

denysvitali 2 hours ago | parent | prev | next [-]

If there's no way to check, then how can you claim it's fraud? :)

chrisjj 2 hours ago | parent | prev | next [-]

There is no level of quality advertised, as far as I can see.

pseidemann 27 minutes ago | parent [-]

What is "level of quality"? Doesn't this apply to any product?

chrisjj 9 minutes ago | parent [-]

[delayed]

bpavuk 2 hours ago | parent | prev | next [-]

> I think delivering lower quality than what was advertised and benchmarked is borderline fraud

welcome to the Silicon Valley, I guess. everything from Google Search to Uber is fraud. Uber is a classic example of this playbook, even.

copilot_king 2 hours ago | parent | prev [-]

If you aren't defrauding your customers you will be left behind in 2026

rootnod3 2 hours ago | parent [-]

That number is a sliding window, isn't it?

sh3rl0ck 2 hours ago | parent | prev [-]

I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.

awestroke 3 hours ago | parent | prev | next [-]

I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load

vidarh 3 hours ago | parent | prev | next [-]

It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.

seunosewa an hour ago | parent | next [-]

Or just reducing the reasoning tokens.

chrisjj 2 hours ago | parent | prev [-]

They advertise the Opus 4.5 model. Secretly substituting a cheaper one to save costs would be fraud.

kingstnap 2 hours ago | parent [-]

Old school Gemini used to do this. It was super obvious because mid day the model would go from stupid to completely brain dead. I have a screenshot of Google's FAQ on my PC from 2024-09-13 that says this (I took it to post to discord):

> How do I know which model Gemini is using in its responses?

> We believe in using the right model for the right task. We use various models at hand for specific tasks based on what we think will provide the best experience.

chrisjj an hour ago | parent [-]

> We use various models at hand for specific tasks based on what we think will provide the best experience

... for Google :)

Wheaties466 3 hours ago | parent | prev [-]

from what I understand this can come from the batching of requests.

chrisjj 2 hours ago | parent [-]

So, a known bug?