Remix.run Logo
spwa4 3 hours ago

Cool. But this makes me wonder. This negates most of the advantages of C. Is there a compiler-autograd "library"? Something that would compile into C specifically to execute as fast as possible on CPUs with no indirection at all.

thechao 2 hours ago | parent | next [-]

At best you'd be restricted to the forward mode, which would still double stack pressure. If you needed reverse mode you'd need 2x stack, and the back sweep over the stack based tape would have the nearly perfectly unoptimal "grain". If you allows the higher order operators (both push out and pull back), you're going to end up with Jacobians & Hessians over nontrivial blocks. That's going to need the heap. It's still better than an unbounded loop tape, though.

We had all these issues back in 2006 when my group was implementing autograd for C++ and, later, a computer algebra system called Axiom. We knew it'd be ideal for NN; I was trying to build this out for my brother who was porting AI models to GPUs. (This did not work in 2006 for both HW & math reasons.)

attractivechaos 2 hours ago | parent | prev | next [-]

> Is there a compiler-autograd "library"?

Do you mean the method theano is using? Anyway, the performance bottleneck often lies in matrix multiplication or 2D-CNN (which can be reduced to matmul). Compiler autograd wouldn't save much time.

sueszli 2 hours ago | parent | prev [-]

a heap-free implementation could be a really cool direction to explore. thanks!

i think you might be interested in MLIR/IREE: https://github.com/openxla/iree