| ▲ | nnevatie 5 hours ago | |
No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example. | ||
| ▲ | amorroxic 4 hours ago | parent | next [-] | |
Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA? | ||
| ▲ | snovv_crash 5 hours ago | parent | prev | next [-] | |
Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics. | ||
| ▲ | pzo 2 hours ago | parent | prev | next [-] | |
how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud. | ||
| ▲ | cik 4 minutes ago | parent | prev | next [-] | |
Ummm embedded robotics is all about this. For years. | ||
| ▲ | monster_truck an hour ago | parent | prev | next [-] | |
I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time | ||
| ▲ | antonvs an hour ago | parent | prev | next [-] | |
We use this in production: https://docs.rs/onnxruntime/latest/onnxruntime/ It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads. | ||
| ▲ | gunalx 5 hours ago | parent | prev | next [-] | |
Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios. | ||
| ▲ | OvervCW 4 hours ago | parent | prev [-] | |
You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other. | ||