Remix.run Logo
nnevatie 5 hours ago

No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.

amorroxic 4 hours ago | parent | next [-]

Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?

snovv_crash 5 hours ago | parent | prev | next [-]

Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.

pzo 2 hours ago | parent | prev | next [-]

how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.

cik 4 minutes ago | parent | prev | next [-]

Ummm embedded robotics is all about this. For years.

monster_truck an hour ago | parent | prev | next [-]

I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time

antonvs an hour ago | parent | prev | next [-]

We use this in production:

https://docs.rs/onnxruntime/latest/onnxruntime/

It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.

gunalx 5 hours ago | parent | prev | next [-]

Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.

OvervCW 4 hours ago | parent | prev [-]

You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.