OP Here. It is my firm belief that the only realistic use of AI in the future is either locally on-device for almost free, or in the cloud but way more expensive then it is today.

The latter option will only bemusedly for tasks that humans are more expensive or much slower in.

This Gemma 4 model gives me hope for a future Siri or other with iPhone and macOS integration, “Her” (as in the movie) style.

▲

crazygringo 7 hours ago | parent | next [-]

> or in the cloud but way more expensive then it is today.

Why? It's widely understood that the big players are making profit on inference. The only reason they still have losses is because training is so expensive, but you need to do that no matter whether the models are running in the cloud or on your device.

If you think about it, it's always going to be cheaper and more energy-efficient to have dedicated cloud hardware to run models. Running them on your phone, even if possible, is just going to suck up your battery life.

▲

mbesto 7 hours ago | parent | next [-]

> It's widely understood that the big players are making profit on inference.

This is most definitely not widely understood. We still don't know yet. There's tons of discussions about people disagreeing on whether it really is profitable. Unless you have proof, don't say "this is widely understood".

	▲	igtt 2 hours ago \| parent [-]
		The reality is we can’t trust accounting earnings anyway. We need to see the cash flows.

▲

zozbot234 7 hours ago | parent | prev | next [-]

The big players are plausibly making profits on raw API calls, not subscriptions. These are quite costly compared to third-party inference from open models, but even setting that up is a hassle and you as a end user aren't getting any subsidy. Running inference locally will make a lot of sense for most light and casual users once the subsidies for subscription access cease.

Also while datacenter-based scaleout of a model over multiple GPUs running large batches is more energy efficient, it ultimately creates a single point of failure you may wish to avoid.

▲

janalsncm 5 hours ago | parent | prev | next [-]

> It's widely understood that the big players are making profit on inference.

If you add in the cost of training, it’s not profitable.

Not including the cost of training is a bit like saying the only cost of a cup of coffee is the paper cup it’s in. The only way OpenAI gets to charge for inference is by selling a product people can’t get elsewhere for much cheaper, which means billions in R&D costs. But because of competition, each model effectively has a “shelf life”.

	▲	tybit 2 hours ago \| parent [-]
		At least Anthropic claims that they are profitable on a per model basis. But since both revenue and training costs are growing exponentially, and they need to pay for model N training today, and only get revenue for model N-1 today, the offset makes it look worse than it is. Obviously that doesn’t help them turn a profit, until they can stop growing training costs exponentially. So it’s really a race to see whether growth in revenue or training costs decelerates first.

▲

5 hours ago | parent | prev | next [-]

[deleted]

▲

huijzer 7 hours ago | parent | prev | next [-]

Laptop/desktop could work. Most systems are on charger most of time anyway

▲

nothinkjustai 7 hours ago | parent | prev | next [-]

> It's widely understood that the big players are making profit on inference.

Are they? Or are they just saying that to make their offerings more attractive to investors?

Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

Locally running models that are snappy and mostly as capable as current sota models would be a dream. No internet connection required, no payment plans or relying on a third party provider to do your job. No privacy concerns. Etc etc.

▲

nl 5 hours ago | parent | next [-]

> Plus I think most people using agents for coding are using subscriptions which they are definitely not profitable in.

Where on earth do people get this idea? Subscriptions that are based around obscure, vendor defined "credits" are the perfect business model for vendors. They can change the amount you can use whenever they want.

It's likely they occasionally make a loss on some users but in general they are highly profitable for AI companies:

> Anthropic last month projected it would generate a 40% gross profit margin from selling AI to businesses and application developers in 2025

and

> OpenAI projected a gross margin of around 46% in 2025, including inference costs of both paying and nonpaying ChatGPT users.

https://archive.is/aKFYZ#selection-1075.0-1083.119

	▲	nothinkjustai 3 hours ago \| parent [-]
		Both of those companies are losing hella money, dude just cuz they say they “expect” to be profitable doesn’t mean they are.

▲

zozbot234 7 hours ago | parent | prev [-]

You can pick models that are snappy, or models that are as capable as SOTA. You don't really get both unless you spend extremely unreasonable amounts of money on what is essentially a datacenter-scale inference platform of your own, meant to service hundreds of users at once. (I don't care how many agent harnesses you spin up at once, you aren't going to get the same utilization as hundreds of concurrent users.)

This assessment might change if local AI frameworks start working seriously on support for tensor-parallel distributed inference, then you might get away with cheaper homelab-class hardware and only mildly unreasonable amounts of money.

▲

jrflowers 6 hours ago | parent | prev [-]

> It's widely understood that the big players are making profit on inference.

I love the whole “they are making money if you ignore training costs” bit. It is always great to see somebody say something like “if you look at the amount of money that they’re spending it looks bad, but if you look away it looks pretty good” like it’s the money version of a solar eclipse

	▲	skybrian 5 hours ago \| parent [-]
		The reason it matters is that if they are making a profit on inference, then when people use their services more, it cuts their losses. They might even break even eventually and start making a profit without raising the price. But if they're losing money on inference, they will lose more money when people use their services more. There's no way to turn that around at that price.

▲

_pdp_ 6 hours ago | parent | prev | next [-]

If you can run free models on consumer devices why do you think cloud providers cannot do the same except better and bundled with a tone of value worth paying?

▲

amelius 7 hours ago | parent | prev | next [-]

A local model running on a phone owned and controlled by the vendor is still not really exciting, imho.

It may be physically "local" but not in spirit.

▲

0dayman 7 hours ago | parent | prev | next [-]

this is not that first step towards your dream

▲

kennywinker 7 hours ago | parent | prev [-]

Did you really watch “Her” and think this is a future that should happen??

Seriously????

▲

jfreds 7 hours ago | parent | next [-]

I don’t think OP’s point has anything to do with AI companions.

The big benefit of moving compute to edge devices is to distribute the inference load on the grid. Powering and cooling phones is a lot easier than powering and cooling a datacenter

	▲	kennywinker 2 hours ago \| parent [-]
		Local ai is probably a good direction, i agree. But there was a part of their point that had to do with ai companions: the bit where they say we are closer to “her”-like ai companions. That was the bit i was responding to.

▲

satvikpendem 4 hours ago | parent | prev | next [-]

What does what they said have anything to do with Her? Local LLMs are better than big corporations owning your data and offering LLMs for a huge cost.

	▲	kennywinker 2 hours ago \| parent [-]
		I get the local ai thing. I agree it’s probably a good direction. The bit that has to do with the movie “her” is the bit at the end where they are excited about “her”-like companions on our phones.

▲

sambapa 7 hours ago | parent | prev | next [-]

Torment Nexus sounds fun

	▲	kennywinker 2 hours ago \| parent [-]
		Watch out! We got an info hazard here! Danger danger

▲

aninteger 7 hours ago | parent | prev | next [-]

Having Scarlett Johansson's voice might not be so bad or even something less robotic.

▲

6 hours ago | parent | next [-]

[deleted]

▲

kennywinker 5 hours ago | parent | prev [-]

That happened already, in typical ai fashion: blatant theft https://www.nbcnews.com/tech/scarlett-johansson-legal-action...

▲

nothinkjustai 3 hours ago | parent [-]

How do you steal a frequency?

▲

kennywinker 2 hours ago | parent [-]

Do you genuinely think a “frequency” is what makes a human voice recognizable?

That’s like using someone’s face in an app and then saying “how can you steal pixels?”

▲

nothinkjustai 2 hours ago | parent [-]

How can you steal pixels?

Or rather, what does “ownership” mean? What does it mean to own light waves? What does it mean to own sound waves? Etc

▲

kennywinker 2 hours ago | parent [-]

You can’t steal pixels or frequencies. But you can use someone’s image or their voice to sell your product without their permission.

You can get all existential about it if you want - I just know that if someone used my face or my voice to shill for a product without my permission i’d be pissed. I’m pretty sure you would be too.

	▲	nothinkjustai 43 minutes ago \| parent [-]
		I’d be pissed if my code was used for training an AI too but that seems legal thus far…

▲

7 hours ago | parent | prev | next [-]

[deleted]

▲

esafak 6 hours ago | parent | prev [-]

Unfortunately, one man's dystopia is another's utopia.