Remix clone Hacker News

new | show | ask | jobs Github

	▲	antonvs an hour ago
		We use this in production: https://docs.rs/onnxruntime/latest/onnxruntime/ It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.