Remix.run Logo
maniacwhat 7 hours ago

What its saying is if you look at any single model, it can be beaten by an ensemble of weaker models. E.g fable 5 is beaten by an ensemble of previous gen models.

JumpCrisscross 7 hours ago | parent [-]

I guess so. 4.8 + 4.8 > Fable 5 is interesting, though not particularly game changing. (The others all fuse frontier models. Which is an argument for using those frontier models more. Not less.)

pants2 6 hours ago | parent [-]

Yeah, all that's really saying is a weaker model with a better harness can beat a stronger model with a worse harness, specifically on the DRACO benchmark

This isn't really a surprising result. Needs more evidence to make a broader claim.