Remix.run Logo
dw_arthur 3 hours ago

Everyone should have their own private evals for models. If I ask a question and a model flat out gets it wrong sometimes I will put it in my test questions bank.