| ▲ | timshel1 5 hours ago | |
Modded-nanogpt is also much more data efficient than vanilla napogpt, even if some of the individual optimizations trade off higher throughput for worse data efficiency. | ||
| ▲ | sdpmas 5 hours ago | parent [-] | |
yes, agreed, modded-nanogpt is already a data-efficient variant of original nanogpt. just that the kinds of algorithms it allows are somewhat constrained because it optimizes for wall clock time. | ||