> I'm surprised there is significant variation between any of the frontier models?
This comment of mine is a bit dated, but even the same model can have significant variation if you change the prompt by just a few words.
https://news.ycombinator.com/item?id=42506554