| ▲ | hhh 3 hours ago | |
these things can just change with infrastructure changes rather than be some mysterious A/B testing. | ||
| ▲ | jumploops an hour ago | parent [-] | |
I don't disagree, we've seen performance shift with capacity changes in the past. With that said, I doubt OpenAI would choose to publish a singular coding benchmark for a new model that exactly matches their previous model (88.8%). | ||