inference code is effectively trivial to port at this time
everyone understands cuda well enough anyway