Remix.run Logo
achierius 2 hours ago

Out of curiosity, what sort of things have you seen it do that better fit 'autoresearch' than 'autotune' thus far? Optimizations it made that wouldn't be been surfaced by an autotune system, I suppose.

karpathy 28 minutes ago | parent [-]

The most recent round of autoresearch (round 2) which decreased "time to GPT-2" from 1.8 hours to 1.65 hours had some examples. I adjusted the program.md to "look at modded nanogpt project and draw inspirations from there for things to try" and it came back with a bunch of tuning, but also tried and implemented new architecture changes, some of which actually helped including the smear gate and the backout skip connection. These are not just hyperparameters, they are new PyTorch code. I'm now working on a more general system that can have a queue of ideas that could be sourced from archive papers, github repos, etc.