| ▲ | zozbot234 3 hours ago | |||||||||||||
H200s and other enterprise datacenter GPUs are completely overkill in any realistic single- or few-users inference scenario. They're hugely unbalanced towards compute capacity which will go almost entirely unused (i.e. wasted) unless you're running huge batches on a continued basis. I've argued many times that local inference engines should support batched inference on a somewhat smaller scale for a variety of reasons (especially given the unexpected effectiveness of SSD streamed inference with larger-than-RAM models), but even I don't think we can realistically go to 300x or so for real-time inference, which is the range that pencils out quite consistently from a simple roofline model of these datacenter cards. | ||||||||||||||
| ▲ | echelon 2 hours ago | parent [-] | |||||||||||||
If you're doing professional work in coding or video, you can easily saturate a single H200. This is what RunPod-type services are for. For instance, ComfyUI is an abomination that can't do half of what Nano Banana and Seedance 2.0 can do. And you have to sit around and wait 10x longer for single results. I can rent an H200 for $3.50 an hour. That's INSANELY cheap. I do not understand this split between hosted APIs and rinky-dink local RTX models. Both suck. The ideal solution is models we own run on RunPods leveraging H200s. I can spend $100-200/day on compute making much more value with the model outputs. ---- edit: I want to respond to comments, but the damned HN rate limits keep me to five comments a day now because I'm a contrarian and say things that rile up the anti-AI folks. You don't need to buy an H200. It's a depreciating asset. You rent one. It's cheap to rent. | ||||||||||||||
| ||||||||||||||