Remix.run Logo
0x3f 4 hours ago

All the curves have been levelling off as expected. Not really sure what you're talking about.

solenoid0937 3 hours ago | parent [-]

They have not, every successful pre-train as of late has had performance increases greater than what the scaling laws predict.

0x3f 3 hours ago | parent [-]

Those gains are arch based, data quality based, etc. Scaling laws only relate to data volume and compute, holding other factors constant.