| ▲ | NanoGPT Slowrun: Language Modeling with Limited Data, Infinite Compute(qlabs.sh) | ||||||||||||||||
| 82 points by sdpmas 4 hours ago | 9 comments | |||||||||||||||||
| ▲ | kseniamorph 44 minutes ago | parent | next [-] | ||||||||||||||||
Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT? | |||||||||||||||||
| ▲ | archermarks 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Very cool idea. Interested to see how this progresses. One question: how worried are you about over-training on this particular dataset? i.e. instead of generalizing you lean more toward memorization? Obviously you leave out a validation set but since you're meta-optimizing the model itself by its performance on the validation dataset you're still at risk of over-fitting. | |||||||||||||||||
| |||||||||||||||||
| ▲ | lzaborowski 2 hours ago | parent | prev | next [-] | ||||||||||||||||
I like the idea of flipping the constraint. Most ML benchmarks assume unlimited data and limited compute, so people optimize for speed. If high-quality training data becomes the real bottleneck, then the interesting question is how much signal you can extract from the same dataset when compute is cheap. | |||||||||||||||||
| ▲ | suddenlybananas 3 hours ago | parent | prev | next [-] | ||||||||||||||||
Reminds me a fair bit of the BabyLM challenge. It would be good to give them a shout-out and see how this challenge differs. | |||||||||||||||||
| |||||||||||||||||
| ▲ | navvyeanand 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Amazing job! | |||||||||||||||||
| ▲ | riajain2525 an hour ago | parent | prev [-] | ||||||||||||||||
Super cool! | |||||||||||||||||