Remix.run Logo
thornewolf 3 days ago

LLM inference is getting cheaper year over year. It often loses money now, it may eventually stop losing money when it gets cheap enough to run.

- But surely the race to the bottom will continue?

Maybe, but they do offer a consumer subscription that can diverge from actual serving costs.

/speculation

lasermike026 3 days ago | parent | next [-]

I'm working with models and the costs are ridiculous. $7000 card and 800 watts later for my small projects and I can't imagine how they can make money in the next 5 to 10 years. I need to do more research on hardware approaching that reduces costs and power consumption. I just started experimenting with llama.cpp and I'm mildly impressed.

Palmik 3 days ago | parent | prev [-]

Looking at API providers like Together that host open source models like Llama 70b and running these models in production myself, they have healthy margins (and their inference stack is much better optimized).