| ▲ | WarmWash 9 hours ago | |||||||
The open models only give the SOTA models a run for their money on gameable benchmarks. On the semi-private ARC-AGI 2 sets they do absolutely awfully (<10% while SOTA is at ~80%) It might be too expensive, but I would be interested in the benchmarks for the current crop of SOTA models. | ||||||||
| ▲ | roenxi 8 hours ago | parent [-] | |||||||
Have the open models been tried? When I look at the leaderboard [0] the only qwen model I see is 235B-A22B. I wouldn't expect an MoE model to do particularly well, from what I've seen (thinking mainly of a leaderboard trying to measure EQ [1]) MoE models are at a distinct disadvantage to regular models when it comes to complex tasks that aren't software benchmark targets. | ||||||||
| ||||||||