Remix.run Logo
Tiny hackable CUDA language model implementation(github.com)
25 points by markusheimerl 3 days ago | 2 comments
yobbo 3 hours ago | parent [-]

Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:

https://github.com/markusheimerl/gpt/blob/main/transformer/a...

markusheimerl 40 minutes ago | parent [-]

I deleted the numerical checks a while back after confirming the backward pass is correct to keep the code base lean - running https://github.com/markusheimerl/gpt/blob/main/transformer/a... is also somewhat of a confirmation that the backward pass is correct, since an analytically incorrect backward pass cant fit perfectly to synthetic data.