▲ | jjani 5 days ago | |
That sounds incredibly disappointing given how high their benchmarks are, indicating they might be overtuned for those, similar to Llama4. | ||
▲ | XCSme 5 days ago | parent [-] | |
Yeah, I think so too. They seemed to be better at specific tasks, but worse overall, at broader tasks. |