For other people reading this:
Back in the day TensorFlow had tfdeploy which compiled TensorFlow terms into NumPy matrix operations. Our synthetic tests saw speedups of factor 50.