| ▲ | mdasen 4 hours ago | |||||||
It's really interesting how much the AI harness seems to matter. Going from 48% via Google's official results to 65% is a huge jump. I feel like I'm constantly seeing results that compare models and rarely seeing results that compare harnesses. Is there a leaderboard out there comparing harness results using the same models? | ||||||||
| ▲ | culi an hour ago | parent | next [-] | |||||||
Maybe the future isn't a human-like centralized intelligence but an octopus-like decentralized intelligence where more focus is placed on making the harness itself "smart" | ||||||||
| ||||||||
| ▲ | manx 3 hours ago | parent | prev | next [-] | |||||||
We probably want to compare the cartesian product of model+harness. | ||||||||
| ▲ | GodelNumbering 4 hours ago | parent | prev [-] | |||||||
I really wish there was! I thought of even creating one but it would be conflict of interest | ||||||||