Remix.run Logo
osti 5 hours ago

> propose, implement, measure, keep the wins

Pretty much what I did to let Codex with gpt5.4xhigh improve my fairly complex CUDA kernel which resulted in 20x throughput improvement.

hackyhacky 5 hours ago | parent [-]

Concretely, what interesting changes did it make to achieve such a significant improvement?

osti 3 hours ago | parent [-]

A lot of it was beyond me, but this was all the branch names for all the stuff it tried, most of it unsuccessful of course. About 10x perf improvement came from architectural changes, and then 2x from micro optimizations.

https://pastebin.com/eac0SAYg