Remix.run Logo
NetOpWibby 2 hours ago

How are they able to compare with Fable when Fable was only available for three days?

Topfi 2 hours ago | parent [-]

Terminalbench numbers are publicly available. What is more interesting, why is that the only benchmark they highlight. Maybe 5.6 isn’t that far ahead of Fable 5 in DeepSWE and FrontierCode (which I consider the most useful and close to my evals + subjective experience)…