Remix.run Logo
antirez an hour ago

Send patches! But remember that many speedups end being not exactly correct and the logits drift. But there is extensive testing and even ds4-eval now to test how it performs.

embedding-shape 42 minutes ago | parent [-]

Hah, it's quick hacks for me to understand CUDA better, I'm unlikely to have time to make them proper enough :( But maybe opening an issue talking about what I tried and what worked, makes sense.

I did confirm no logits drift, as you so nicely have provided tooling for ensuring exactly this, thanks for the great care that obviously gone into the project, been a pleasure to play around with! :)