Remix.run Logo
jona-f an hour ago

Kimi K2.6 and mimo 2.5 pro are ahead of deepseek v4 in other benchmarks. Anyhow, great work, the benchmark seems to show great separation, so should be very useful to improve the math capabilities of the next generation of ai. I'm more interested in the prompt engineering/orchestration and technical details (what I can do without millions), but I get that you are mathematicians, so your focus is obviously on the math. Sorry for the nagging.