| ▲ | deaux 3 hours ago | |
1. Would be good to benchmark at least one other model from a different family to see if it indeed generalizes. Minimax 2.7 seems a good candidate to keep it affordable. Until then we can't really tell if it's just overfit on Gemini 3 Flash. 2. Until then your landing page needs to mention all the numbers are just from running on Gemini 3 Flash. Currently there's no mention at all of Gemini. 3. Assuming that cheaper also means faster in this case where model is equal? If so, then why not add this to the benchmarks to highlight another advantage - time until completion of the tasks. If it's the opposite and it takes longer (seems unlikely), then it would be transparent to note this. 4. Would be good to note if it does or does not support skills, (nested) AGENTS.md, MCP and so on for people considering migrating. | ||
| ▲ | GodelNumbering 2 hours ago | parent [-] | |
Good points. 1. I have been trying to benchmark openweights models but keep running into timeouts due to slow inference (terminal bench tasks have strict timeouts that you are not allowed to modify). Posted my frustration here https://www.reddit.com/r/LocalLLaMA/comments/1stgt39/the_fru... 2. Done (updated github readme) 3. Yes, on an average the times were shorter, but I did not benchmark it because at random times, the model outputs get slower, so it is not a rigorous benchmark 4. Added info on this too | ||