Remix.run Logo
100ms 4 hours ago

Tiny model overfit on benchmark published 3 years prior to its training. News at 10

selimthegrim 3 hours ago | parent | next [-]

It wasn't important enough to make the 11 o'clock program.

bigyabai 4 hours ago | parent | prev | next [-]

But GPT-3.5 was benchmaxxing too.

100ms 4 hours ago | parent [-]

GPT 3.5 Turbo knowledge cutoff was circa 2021. MT-Bench is from 2023. Not suggesting improvements on small models aren't possible (or forthcoming, the 1.85 bit etc models look exciting), but this almost certainly isn't that.

fredmendoza 4 hours ago | parent [-]

[dead]

srslyTrying2hlp 4 hours ago | parent | prev [-]

[dead]