▲ | jjani 5 days ago | |
This was nothing but "temporary", it's still in place; the last time we ran the evals is 2 weeks ago and it's the exact same. It can't be a "capacity glitch" either, as it actually outputs those as proper tokens. It's possible that it was an internal system prompt change despite the claims of "there is no system prompt on the API", but this is in effect the same as changing the model. > There IS evidence that would satisfy me, but I'd need to see it. Describe what this evidence would look like. It sure feels like an appeal to authority - if I'd be someone with a "name" I'm sure you'd believe it. If you'd had had the same set of evals set up since then, you wouldn't have questioned this at all. You don't. > I don't think you're making it up, but without a lot more details I can't be convinced that your methodology was robust enough to prove what you say it shows. Go and poke holes at it then, go on. I've clearly explained the methodology. |