> Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).

Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?

▲

inkysigma 9 hours ago | parent [-]

I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.

I think there's quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.

	▲	readitalready 9 hours ago \| parent [-]
		Usually involves a lot of agents and their custom contexts or system prompts.