| ▲ | akshayvegesna 9 hours ago | |
You seem to be making two points: - synthetic data is a valuable direction to pursue when you have compute - chinchilla scaling laws have some flaws for small models Both of these are side points to the core purpose of the Slowrun. The main point is the 100M tokens we train on push people to come up with novel ideas to improve pretraining, outside of facile synthetic data generation. I think we should continue to push on synthetic data, but why not come up with some new ideas too? You cannot use synthetic data for everything (see sdpmas's point) | ||