It's really interesting how much the AI harness seems to matter. Going from 48% via Google's official results to 65% is a huge jump. I feel like I'm constantly seeing results that compare models and rarely seeing results that compare harnesses.

Is there a leaderboard out there comparing harness results using the same models?

▲

culi an hour ago | parent | next [-]

Maybe the future isn't a human-like centralized intelligence but an octopus-like decentralized intelligence where more focus is placed on making the harness itself "smart"

	▲	dominotw an hour ago \| parent [-]
		That would be counter to AI company goals. They want harness to be dumb and models to be smart so they can sell models.

▲

manx 3 hours ago | parent | prev | next [-]

We probably want to compare the cartesian product of model+harness.

▲

GodelNumbering 4 hours ago | parent | prev [-]

I really wish there was! I thought of even creating one but it would be conflict of interest