> 2. Almost three years in, companies investing in LLMs have not yet discovered a business model that justifies the massive expenditure of training and hosting them, the majority of consumer usage is at the free tier, the industry is seeing the first signs of pulling back investments, and model capabilities are plateauing at a level where most people agree that the output is trite and unpleasant to consume.

You hit the nail on why I say to much hatred from "AI Bros" as I call them, when I say it will not take off truly until it runs on your phone effortlessly, because nobody wants to foot a trillion dollar cloud bill.

Give me a fully offline LLM that fits in 2GB of VRAM and lets refine that so it can plug into external APIs and see how much farther we can take things without resorting to burning billions of dollars' worth of GPU compute. I don't care that my answer arrives instantly, if I'm doing the research myself, I want to take my time to get the correct answer anyway.

▲

saratogacx a day ago | parent | next [-]

We actually aren't too far off from that reality. There are several models you can run fully offline on your phone (phi-3, Gemma-3n-E2b-it, Qwen2.5-1.5b-instruct all run quite well on my Samsung S24 ultra). There are a few offline apps that also have tool calling (mostly for web search but I suspect this is extendable).

If you want to play around a bit and are on android there is PocketPal,ChatterUI, MyDeviceAI, SmolChat are good multi-model apps and Google's Edge gallery won't keep your chats but is a fun tech demo.

All are on github and can be installed using Obtainium if you don't want to

▲

DSingularity a day ago | parent | prev [-]

You aren’t extrapolating enough. Nearly the entire history of computing has been one that isolates between shared computing and personal computing. Give it time. These massive cloud bills are building the case for accelerators in phones. It’s going to happen just needs time.

▲

giancarlostoro a day ago | parent [-]

That's fine, that's what I want ;) I just grow tired of people hating on me for thinking that we really need to localize the models for them to take off.

	▲	DSingularity 21 hours ago \| parent [-]
		I’m not sure why people are hating on you. If you love being free then you should love the idea of being independent when it comes to common computing. If LLM is to become common we should all be rooting for open weights and efficient local execution. It’s gonna take some time but it’s inevitable I think.