Remix clone Hacker News

new | show | ask | jobs Github

	▲	nnevatie 5 hours ago
		No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.
	▲	amorroxic 4 hours ago \| parent \| next [-]
		Curious on what backs this assertion. As a counterpoint we’ve been running 200+ models in production for more than 5 years - language models, embedding, classifiers, low tens to hundred M params. Traffic in the order of 1-2M requests/day and everything is enabled by onnx with some cgo (or Rust) plumbing on top. What’s your SLA?
	▲	snovv_crash 5 hours ago \| parent \| prev \| next [-]
		Strong statement to make when I have at least 2 datapoints contradicting it, in SaaS and embedded/robotics.
	▲	pzo 2 hours ago \| parent \| prev \| next [-]
		how are supposed to use TensorRT on iOS, iPadOS, Android or even Web? Production is not only cloud.
	▲	cik 4 minutes ago \| parent \| prev \| next [-]
		Ummm embedded robotics is all about this. For years.
	▲	monster_truck an hour ago \| parent \| prev \| next [-]
		I've never understood how anyone comes into contact with it and thinks its anything more than an incredible inconvenience masked as the easy way of doing things. Given it a few good shakes for various uses and regretted the time spent each time
	▲	antonvs an hour ago \| parent \| prev \| next [-]
		We use this in production: https://docs.rs/onnxruntime/latest/onnxruntime/ It’s a Rust wrapper around ONNX Runtime. We currently serve 5+ million inference requests per day for a highly performance-sensitive application, for a long list of major enterprise clients. We don’t use GPUs for inference, because it would be cost-prohibitive. We launch tens of thousands of VMs per day to run these workloads.
	▲	gunalx 5 hours ago \| parent \| prev \| next [-]
		Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.
	▲	OvervCW 4 hours ago \| parent \| prev [-]
		You can use ONNXRuntime with a TensorRT backend, so one does not exclude the other.