| ▲ | solenoid0937 3 hours ago | |
They have not, every successful pre-train as of late has had performance increases greater than what the scaling laws predict. | ||
| ▲ | 0x3f 3 hours ago | parent [-] | |
Those gains are arch based, data quality based, etc. Scaling laws only relate to data volume and compute, holding other factors constant. | ||