Remix.run Logo
7777777phil 3 hours ago

As soon as tokens stop stop being subsidized, heavy agentic use will become as least as expensive than paying an (entry level) employee. When this happens many companies will trade off havy tolen usage for (maybe a bit slower, bit less accurate) employees again.

Wowfunhappy 2 hours ago | parent | next [-]

DeepSeek is an open weights model. It's possible the hosted versions are subsidized, but we know what it costs to run locally. And it's expensive, but it's also pretty clearly cheaper than an employee.

Of course, the latest DeepSeek models are not as good as Claude, but they're not super far off either.

amluto an hour ago | parent | next [-]

When you use DeepSeek’s first-party API, you are giving them your token stream. This has some training value, but it also has incredible amounts of, well, business intelligence value. When you tell AWS your secrets or your customer data, you can be fairly confident they won’t abuse that knowledge. When you give this data to, say, OpenAI, they more or less promise not to abuse it if you’re on an appropriate business plan. If you give it to DeepSeek, even incidentally as something your agent reads, I would be quite surprised if DeepSeek doesn’t mine it for whatever purpose they or the government feel is appropriate.

The risk of letting your agent read .env goes far beyond the risk that the agent itself does something you don’t like with the contents.

Wowfunhappy an hour ago | parent [-]

But this shouldn't be a risk if you host the model locally.

irishcoffee 2 hours ago | parent | prev [-]

They're not far off, getting the same seamless integration as hosted models is a full time job. I think what just happened is that devops is about to explode. What will naturally follow is local hosting of all the things when people realize subscription costs for cloud-whatever are absurd.

Gitlab is going to take off? This is not investment advice.

Wowfunhappy 2 hours ago | parent [-]

> What will naturally follow is local hosting of all the things when people realize subscription costs for cloud-whatever are absurd.

Even acknowledging we don't know exactly what costs would look like in a world without VC money, wouldn't hosting models logically be cheaper to do at scale in a data center?

When I compared to the cost of running DeepSeek locally, I meant that we can treat that cost as a price ceiling, not the floor.

Groxx 2 hours ago | parent [-]

Like how server hosting at scale in a datacenter is cheaper than running your own datacenter? Despite ~every company consistently concluding that hosting their own stuff is several multiples cheaper?

No, I think local stuff using also-useful-for-other-things hardware will vastly undercut cloud hosting when the free money pipeline shuts down, and will stay that way for roughly forever. That doesn't mean cloud stuff isn't useful, clearly it is, but adding another company in the middle is rarely the solution for reducing costs.

an hour ago | parent [-]
[deleted]
stult 2 hours ago | parent | prev | next [-]

You're assuming the price won't come down as the tech matures. That seems like a big assumption, considering how quickly open weights models are catching up to frontier models, and how little effort has been invested so far in optimizing inference costs.

It's especially a crazy assumption to make relative to the costs of employing a human. The costs of paying an entry level employee are unlikely to go down at all, and even if those costs do decline, there's a floor they can't drop below (minimum wage at the extreme end), whereas companies are free to optimize agentic costs as close to zero as possible.

So you are assuming that a cost which is extremely susceptible to optimization but which no one has yet seriously attempted to minimize will remain perpetually above a cost which is much less susceptible to optimization, is already subject to enormous efforts to minimize, and has a legally mandated floor. That seems like a bad bet.

skybrian 2 hours ago | parent | prev | next [-]

Maybe this just counts as “light use” since I’m a hobbyist programmer and I only run one coding agent session at a time, but I get about as much done as I did back when I was working while spending a lot of time browsing the Internet, etc.

I’ve spent $10-$20 a day using Claude to write code and closer to $5 a day now that I mostly use Deepseek and GLM, using API pricing (no subscriptions) since I don’t use Claude Code.

This is a rounding error for a company. So I think there’s plenty of room to use AI extensively while being more cost-conscious.

kingstnap 2 hours ago | parent | prev | next [-]

A significant caveat is that there is a pricing mismatch that makes it so first party's can subsidize quite heavily.

Agents are expensive in large part because tool calls require round trips. It's because these APIs are stateless and not streaming so you have to resend the whole context each time. This means you have roughly #tool calls x 1/2 context size cached input tokens over any given session. Most API providers overcharge you by a huge amount for cached tokens. A exception being Deepseek. Paying OpenAI $0.05 for 100k cached GPT5.5 tokens during a possibly 2 second round trip agent tool call is like paying $100/hr for what is likely to be ~10 to 20 GB of VRAM residence (holding the KV cache).

Or it got offloaded to NVME and you are paying $0.05 for that much PCIe bandwidth.

helloplanets 2 hours ago | parent | prev | next [-]

More straightforward to talk about the hardware directly. Full Kimi K2.6 needs an 8x H200 node to run and serve around 20 heavy users. You can rent an 8x H200 node for around $30/hr.

I'd imagine GPT-5.5 and Claude Opus 4.7 could run just fine on a 16x H200 node and serve at least 10 heavy users without the token output getting choppy.

saghm 2 hours ago | parent | prev | next [-]

What's funny is that this apparently wasn't something that the Uber COO seemed to think about when their company is arguably one of the most successful ever at the "subsidize to drive down costs until you capture nearly the entire market" strategy.

fredley 2 hours ago | parent | prev | next [-]

I think if local models catch up with current SOTA then that might not happen. Either way, I'm don't think the long-term for OAI, Anthropic etc. really holds up.

cryo32 2 hours ago | parent | prev | next [-]

This is what I’m betting on.

The financials don’t make sense now. Based on the expenditure the finances won’t ever make sense.

BadBadJellyBean 2 hours ago | parent | prev [-]

I have been saying the same for while. Someone always says "but Anthropic is making money on their API" or "But it's inference will get cheaper". But I don't believe it. first all the investments have to payed off at some point and second of all there are other things that cost money. I don't believe that any of them have a positive balance sheet.

I also don't think that blitz scaling will work like with Uber. The engineers are still there. We can work without the LLM tools.

solenoid0937 2 hours ago | parent [-]

If by "investments will pay off" you mean major profits, that's never going to happen as long as scaling laws hold. All revenue will just go to financing more compute, and either we hit AGI or have the greatest economic collapse in modern history.

The world will look drastically different 5 years from now; for the better or worse, so save every penny (especially if you work in tech).