| ▲ | wongarsu 3 hours ago | |
It's also a great base for experimentation. If you have an idea for an architecture improvement you can try it for $36 on the 20 layer nanocode setting, then for another $200 see how it holds up on the "full scale" nanocode Kaparthy's notes on improving nanochat [1] are one of my favorite blog-like things to read. Really neat to see which features have how much influence, and how the scaling laws evolve as you improve the architecture There's also modded-nanogpt which turns the same kind of experimentation into a training speedrun (and maybe loses some rigor on the way) [2] 1 https://github.com/karpathy/nanochat/blob/master/dev/LOG.md | ||