| ▲ | kseniamorph 5 hours ago | |||||||
Curious about the baseline choice. modded-nanogpt was optimized for wall-clock speed, not data efficiency, so it seems like an unusual reference point for this kind of benchmark. Why not vanilla NanoGPT? | ||||||||
| ▲ | timshel1 4 hours ago | parent [-] | |||||||
Modded-nanogpt is also much more data efficient than vanilla napogpt, even if some of the individual optimizations trade off higher throughput for worse data efficiency. | ||||||||
| ||||||||