| ▲ | ACCount37 3 hours ago | |
"They made the model dumber" on literally the same checkpoint with the same prompt on the same quantization running on the same hardware is a staple of AI complaints. Users are completely incapable of objectively evaluating model quality over time. Which makes it all the harder to notice actual "stealth nerfs", misconfigurations or other technical issues. Because "they made the model DUMBER, for REAL this time" is background noise. | ||
| ▲ | dannyw 25 minutes ago | parent [-] | |
How are you so sure that frontier API models are always running the same quant/weights/etc? You think OpenAI and Anthropic are running essentially just vLLM endpoints? Of course not. Firstly, we know Anthropic has been doing prompt injection into their 1P APIs (not bedrock/vertex AFAIK) for at least a year now. https://old.reddit.com/r/ClaudeAI/comments/1f6hcwo/injection... This can be verified pretty quickly like OP — count the token metrics, if your context contains classifier-firing terms, you’ll see input_tokens being higher than your input. So if they’re already doing that, what makes you think it’s just a dumb API, instead of a complicated pipeline filled with trade secrets and optimisations? | ||