| ▲ | ponyous 6 hours ago | ||||||||||||||||
Just ran and scored 63 3d model generations (via code) across high and no reasoning. 3D Modeling benchmark quickly shows spatial, logic and code performance of the model so I think it's a very good indicator of the quality. Here are the results compared to Gemini 3.5 Flash:
Although it is cheaper, it is significantly slower, and results are worse overall. Surprisingly - high reasoning produces less code errors than gemini 3.5 flash, but when I actually look at the models they are worse.Edit: I recently ran evals with Kimi 2.7 and MiniMax-M3 and this is clearly open source SOTA model, by far. | |||||||||||||||||
| ▲ | NiloCK 6 hours ago | parent | next [-] | ||||||||||||||||
Very interested in this! Can you share more about the modelling method (eg, three js?), the task list, and outputs here? I think there's probably some good juice to squeeze in terms of spacial awareness by doing a benchmark something like - give 3d modelling task - render and snapshot from a variety of angles - feed to third-party vision model for a "what is this" type query - grade on end-to-end accuracy Bonus points for asking the vision model something like "how beautiful is this 1-10". | |||||||||||||||||
| |||||||||||||||||
| ▲ | ComputerGuru 5 hours ago | parent | prev [-] | ||||||||||||||||
Would you be able to run it against Gemini Flash (not Lite) 3.0, high thinking? | |||||||||||||||||
| |||||||||||||||||