| ▲ | antonvs an hour ago | |
We use this in production: https://docs.rs/onnxruntime/latest/onnxruntime/ It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads. | ||