| ▲ | wavemode 2 days ago | |
Is 7 extra percent on HLE benchmark really worth the cost of running an entire ensemble of models? | ||
| ▲ | kenmu 2 days ago | parent | next [-] | |
I mentioned in another comment that I make sure the cost/time is within 1.25x of the next best single-model run. So it's not perfect, but I think that aspect will only get better with time. Of course I'm biased, but using Sup has been great for me personally. Even disregarding the HLE score, having many different perspectives in the answers, and most importantly the combined answer, has been very helpful in feedback for architectural decisions I make for Sup, and many other questions I would normally ask ChatGPT/Gemini/Claude/Grok individually. | ||
| ▲ | kelseyfrog 2 days ago | parent | prev [-] | |
Depends on the use-case and requirements. | ||