| ▲ | 6thbit 10 hours ago | |||||||
> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh). Interesting. Whats that “scaffold”? A sort of unit test framework for proofs? | ||||||||
| ▲ | inkysigma 9 hours ago | parent [-] | |||||||
I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods. I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky. | ||||||||
| ||||||||