Remix.run Logo
unchar1 2 hours ago

It's not just figuring out if a model is good at things, but is it good at the things I care about.

Using a targeted eval suite (like a test suite) tells us that.