Remix.run Logo
alansaber an hour ago

Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes

jasonjmcghee an hour ago | parent [-]

Sure - I was more commenting that they are all > 6 months old, which sounds silly, but things have been changing fast, and instruction following is definitely an area that has been developing a lot recently. I would be surprised if accuracy drops off that hard still.

0xblacklight 16 minutes ago | parent [-]

I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post