| |
| ▲ | etaioinshrdlu 20 hours ago | parent | next [-] | | This is wrong because LLMs are cheap enough to run profitably on ads alone (search style or banner ad style) for over 2 years now. And they are getting cheaper over time for the same quality. It is even cheaper to serve an LLM answer than call a web search API! Zero chance all the users evaporate unless something much better comes along, or the tech is banned, etc... | | |
| ▲ | scubbo 20 hours ago | parent [-] | | > LLMs are cheap enough to run profitably on ads alone > It is even cheaper to serve an LLM answer than call a web search API These, uhhhh, these are some rather extraordinary claims. Got some extraordinary evidence to go along with them? | | |
| ▲ | etaioinshrdlu 19 hours ago | parent | next [-] | | I've operated a top ~20 LLM service for over 2 years, very comfortably profitably with ads. As for the pure costs you can measure the cost of getting an LLM answer from say, OpenAI, and the equivalent search query from Bing/Google/Exa will cost over 10x more... | | |
| ▲ | johnecheck 13 hours ago | parent | next [-] | | So you don't have any real info on the costs. The question is what OpenAI's profit margin is here, not yours. The theory is that these costs are subsidized by a flow of money from VCs and big tech as they race. How cheap is inference, really? What about 'thinking' inference? What are the prices going to be once growth starts to slow and investors start demanding returns on their billions? | | |
| ▲ | jsnell 12 hours ago | parent | next [-] | | Every indication we have is that pay-per-token APIs are not subsidized or even break-even, but have very high margins. The market dynamics are such that subsidizing those APIs wouldn't make much sense. The unprofitability of the frontier labs is mostly due to them not monetizing the majority of their consumer traffic at all. | |
| ▲ | etaioinshrdlu 9 hours ago | parent | prev [-] | | It would be profitable even if we self-hosted the LLMs, which we've done. The only thing subsidized is the training costs. So maybe people will one day stop training AI models. |
| |
| ▲ | throwawayoldie 11 hours ago | parent | prev | next [-] | | So you're not running an LLM, you're running a service built on top of a subsidized API. | | | |
| ▲ | clarinificator 18 hours ago | parent | prev [-] | | Profitably covering R&D or profitably using the subsidized models? |
| |
| ▲ | haiku2077 20 hours ago | parent | prev [-] | | https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch..., also note the "objections" section Anecdotally thanks to hardware advancements the locally-run AI software I develop has gotten more than 100x faster in the past year thanks to Moore's law | | |
| ▲ | oblio 20 hours ago | parent | next [-] | | What hardware advancement? There's hardly any these days... Especially not for this kind of computing. | | |
| ▲ | Sebguer 19 hours ago | parent | next [-] | | Have you heard of TPUs? | | |
| ▲ | Dylan16807 19 hours ago | parent | next [-] | | Sort of a hardware advancement. I'd say it's more of a sidegrade between different types of well-established processor. Take out a couple cores, put in some extra wide matrix units with accumulators, watch the neural nets fly. But I want to point out that going from CPU to TPU is basically the opposite of a Moore's law improvement. | |
| ▲ | oblio 19 hours ago | parent | prev [-] | | Yeah, I'm a regular Joe. How do I get one and how much does it cost? | | |
| ▲ | Dylan16807 18 hours ago | parent [-] | | If your goal is "a TPU" then you buy a mac or anything labeled Copilot+. You'll need about $600. RAM is likely to be your main limit. (A mid to high end GPU can get similar or better performance but it's a lot harder to get more RAM.) | | |
| ▲ | haiku2077 18 hours ago | parent | next [-] | | $500 if you catch a sale at Costco or Best Buy! | |
| ▲ | oblio 17 hours ago | parent | prev [-] | | I want something I can put in my own PC. GPUs are utterly insane in pricing, since for the good stuff you need at least 16GB but probably a lot more. | | |
| ▲ | Dylan16807 17 hours ago | parent [-] | | 9060 XT 16GB, $360 5060 Ti 16GB, $450 If you want more than 16GB, that's when it gets bad. And you should be able to get two and load half your model into each. It should be about the same speed as if a single card had 32GB. | | |
| ▲ | oblio 4 hours ago | parent [-] | | > And you should be able to get two and load half your model into each. It should be about the same speed as if a single card had 32GB. This seems super duper expensive and not really supported by the more reasonably priced Nvidia cards, though. SLI is deprecated, NVLink isn't available everywhere, etc. | | |
| ▲ | Dylan16807 3 hours ago | parent [-] | | No, no, nothing like that. Every layer of an LLM runs separately and sequentially, and there isn't much data transfer between layers. If you wanted to, you could put each layer on a separate GPU with no real penalty. A single request will only run on one GPU at a time, so it won't go faster than a single GPU with a big RAM upgrade, but it won't go slower either. |
|
|
|
|
|
| |
| ▲ | haiku2077 19 hours ago | parent | prev [-] | | Specifically, I upgraded my mac and ported my software, which ran on Windows/Linux, to macos and Metal. Literally >100x faster in benchmarks, and overall user workflows became fast enough I had to "spend" the performance elsewhere or else the responses became so fast they were kind of creepy. Have a bunch of _very_ happy users running the software 24/7 on Mac Minis now. | | |
| ▲ | oblio 4 hours ago | parent [-] | | The thing is, these kinds of optimizations happen all the time. Some of them can be as simple as using a hashmap instead of some home-baked data structure. So what you're describing is not necessarily some LLM specific improvement (though in your case it is, we can't generalize to every migration of a feature to an LLM). And nothing I've seen about recent GPUs or TPUs, from ANY maker (Nvidia, AMD, Google, Amazon, etc) say anything about general speedups of 100x. Heck, if you go across multiple generations of what are still these very new types of hardware categories, for example for Amazon's Inferentia/Trainium, even their claims (which are quite bold), would probably put the most recent generations at best at 10x the first generations. And as we all know, all vendors exaggerate the performance of their products. |
|
| |
| ▲ | 19 hours ago | parent | prev [-] | | [deleted] |
|
|
| |
| ▲ | lumost 20 hours ago | parent | prev [-] | | The free tiers might be tough to sustain, but it’s hard to imagine that they are that problematic for OpenAI et al. GPUs will become cheaper, and smaller/faster models will reach the same level of capability. | | |
| ▲ | 9 hours ago | parent | next [-] | | [deleted] | |
| ▲ | throwawayoldie 10 hours ago | parent | prev [-] | | [citation needed] | | |
| ▲ | jdiff 9 hours ago | parent [-] | | Eh, I kinda see what they're saying. They haven't become cheaper at all, but GPUs have increased in performance, and the amount of performance you get for each dollar spent has increased. Relative to its siblings, things have gotten worse. A GTX 970 could hit 60% of the performance of the full Titan X at 35% of the price. A 5070 hits 40% of a full 5090 for 27% of the price. That's overall less series-relative performance you're getting, for an overall increased price, by about $100 when adjusting for inflation. But if you have a fixed performance baseline you need to hit, as long as tech gets improving, things will eventually be cheaper for that baseline. As long as you aren't also trying to improve in a way that moves the baseline up. Which so far has been the only consistent MO of the AI industry. |
|
|
|