Remix.run Logo
aykutseker 3 hours ago

flash beating the pro it was distilled from is suspicious, not surprising.distillation usually loses you something. if the smaller model is winning on agentic evals, the more likely read is the evals weren't measuring agent quality in the first place. that's the bigger problem for builders, not which model to pick.

xnx 3 hours ago | parent [-]

> flash beating the pro it was distilled from is suspicious

Is it? I thought Flash 3.5 was beating 3.1 Pro.