| ▲ | highfrequency 5 hours ago | ||||||||||||||||
Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute). | |||||||||||||||||
| ▲ | ZeroCool2u 5 hours ago | parent [-] | ||||||||||||||||
Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on. | |||||||||||||||||
| |||||||||||||||||