Remix.run Logo
bloaf 6 hours ago

Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:

https://github.com/taichi-dev/taichi_benchmark

pjmlp 6 hours ago | parent [-]

And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.