| ▲ | cmrdporcupine 3 hours ago | |||||||
I've personally witnessed large variability in behaviour even within a given session -- which makes sense as there's nothing stopping Anthropic from shuttling your context/session around load balanced through many different servers, some of which might be quantized heavily to manage load and others not at all. I don't know if they do this or not, but the nature of the API is such you could absolutely load balance this way. The context sent at each point is not I believe "sticky" to any server. TLDR you could get a "stupid" response and then a "smart" response within a single session because of heterogeneous quantization / model behaviour in the cluster. | ||||||||
| ▲ | epolanski 3 hours ago | parent [-] | |||||||
I've defended opus in the last weeks but the degradation is tangible. It feels like it degraded by a generation tbh. | ||||||||
| ||||||||