Remix.run Logo
zozbot234 4 hours ago

If you can get near 100% utilization for your own GPUs (i.e. you're letting requests run overnight and not insisting on any kind of realtime response) it starts to make sense. OpenRouter doesn't have any kind of batched requests API that would let you leverage that possibility.

spmurrayzzz 3 hours ago | parent | next [-]

For inference, even with continuous batching, getting 100% MFUs is basically impossible to do in practice. Even the frontier labs struggle with this in highly efficient infiniband clusters. Its slightly better with training workloads just due to all the batching and parallel compute, but still mostly unattainable with consumer rigs (you spend a lot of time waiting for I/O).

I also don't think the 100% util is necessary either, to be fair. I get a lot of value out of my two rigs (2x rtx pro 6000, and 4x 3090) even though it may not be 24/7 100% MFU. I'm always training, generating datasets, running agents, etc. I would never consider this a positive ROI measured against capex though, that's not really the point.

zozbot234 3 hours ago | parent [-]

Isn't this just saying that your GPU use is bottlenecked by things such as VRAM bandwidth and RAM-VRAM transfers? That's normal and expected.

sowbug 3 hours ago | parent | prev [-]

In Silicon Valley we pay PG&E close to 50 cents per kWh. An RTX 6000 PC uses about 1 kW at full load, and renting such a machine from vast.ai costs 60 cents/hour as of this morning. It's very hard for heavy-load local AI to make sense here.

btbuildem 3 hours ago | parent | next [-]

Yikes.. I pay ~7¢ per kWh in Quebec. In the winter the inference rig doubles as a space heater for the office, I don't feel bad about running local energy-wise.

Imustaskforhelp 3 hours ago | parent | prev [-]

And you are forgetting the fact that things like vast.ai subscriptions would STILL be more expensive than Openrouter's api pricing and even more so in the case of AI subscriptions which actively LOSE money for the company.

So I would still point out the GP (Original comment) where yes, it might not make financial sense to run these AI Models [They make sense when you want privacy etc, which are all fair concerns but just not financial sense]

But the fact that these models are open source still means that they can be run when maybe in future the dynamics might shift and it might make sense running such large models locally. Even just giving this possibility and also the fact that multiple providers could now compete in say openrouter etc. as well. All facts included, definitely makes me appreciate GLM & Kimi compared to proprietory counterparts.

Edit: I highly recommend this video a lot https://www.youtube.com/watch?v=SmYNK0kqaDI [AI subscription vs H100]

This video is honestly one of the best in my opinion about this topic that I watched.

HumanOstrich 3 hours ago | parent [-]

Why did you quote yourself at the end of this comment?

Imustaskforhelp 2 hours ago | parent [-]

Oops sorry. Fixed it now but I am trying a HN progressive extension and what it does is if I have any text selected it can actually quote it and I think this is what might've happened or such a bug I am not sure.

It's fixed now :)