| ▲ | swazzy 5 hours ago |
| similar vibes as "640k ought to be enough for anybody" |
|
| ▲ | Philip-J-Fry 31 minutes ago | parent | next [-] |
| I think the difference is that with LLMs, in a lot of cases you do see some diminishing returns. I won't deny that the latest Claude models are fantastic at just one shotting loads of problems. But we have an internal proxy to a load of models running on Vertex AI and I accidentally started using Opus/Sonnet 4 instead of 4.6. I genuinely didn't know until I checked my configuration. AI models will get to this point where for 99% of problems, something like Gemma is gonna work great for people. Pair it up with an agentic harness on the device that lets it open apps and click buttons and we're done. I still can't fathom that we're in 2026 in the AI boom and I still can't ask Gemini to turn shuffle mode on in Spotify. I don't think model intelligence is as much of an issue as people think it is. |
| |
| ▲ | mewpmewp2 12 minutes ago | parent [-] | | I mean to me even difference between Opus and Sonnet is as clear as day and night, and even Opus and the best GPT model. Opus 4.6 just seems much more reliable in terms of me asking it to do something, and that to actually happen. | | |
| ▲ | Philip-J-Fry 7 minutes ago | parent [-] | | It depends what you're asking it though. Sure, in a software development environment the difference between those two models is noticeable. But think about the general user. They're using the free Gemini or ChatGPT. They're not using the latest and greatest. And they're happy using it. And I am willing to bet that a lot of paying users would be served perfectly fine by the free models. If a capable model is able to live on device and solve 99% of people's problems, then why would the average person ever need to pay for ChatGPT or Gemini? |
|
|
|
| ▲ | shermantanktop 4 hours ago | parent | prev | next [-] |
| Well you can do a lot with 640k…if you try. We have 16G in base machines and very few people know how to try anymore. The world has moved on, that code-golf time is now spent on ad algorithms or whatever. Escaping the constraint delivered a different future than anticipated. |
| |
| ▲ | throwaw12 2 hours ago | parent | next [-] | | > you can do a lot with 640k…if you try. it is economically not viable to try anymore. "XYZ Corp" won't allow their developers to write their desktop app in Rust because they want to consume only 16MB RAM, then another implementation for mobile with Swift and/or Kotlin, when they can release good enough solution with React + Electron consuming 4GB RAM and reuse components with React Native. | |
| ▲ | jstummbillig an hour ago | parent | prev | next [-] | | People get hung up on bad optimization. It you are the working at sufficiently large scale, yes, thinking about bytes might be a good use of your time. But most likely, it's not. At a system level we don't want people to do that. It's a waste of resources. Making a virtue out of it is bad, unless you care more about bytes than humans. | | |
| ▲ | TeMPOraL 40 minutes ago | parent [-] | | These bytes are human lives. The bytes and the CPU cycles translate to software that takes longer to run, that is more frustrating, that makes people accomplish less in longer time than they could, or should. Take too much, and you prevent them from using other software in parallel, compounding the problem. Or you're forcing them to upgrade hardware early, taking away money they could better spend in different areas of their lives. All this scales with the number of users, so for most software with any user base, not caring about bytes and cycles is wasting much more people-hours than is saving in dev time. |
| |
| ▲ | stavros 40 minutes ago | parent | prev | next [-] | | The simple fact is that a 16 GB RAM stick costs much less than the development time to make the app run on less. | |
| ▲ | raverbashing an hour ago | parent | prev [-] | | Especially if the 640k are "in your hand" and the rest is "in the cloud" |
|
|
| ▲ | pdpi an hour ago | parent | prev | next [-] |
| Look at the whole history of computing. How many times has the pendulum swung from thin to fat clients and back? I don't think it's even mildly controversial to say that there will be an inflection point where local models get Good Enough and this iteration of the pendulum shall swing to fat clients again. |
|
| ▲ | flir 3 hours ago | parent | prev [-] |
| Assuming improvements in LLMs follow a sigmoid curve, even if the cloud models are always slightly ahead in terms of raw performance it won't make much of a difference to most people, most of the time. The local models have their own advantages (privacy, no -as-a-service model) that, for many people and orgs, will offset a small performance advantage. And, of course, you can always fall back on the cloud models should you hit something particularly chewy. (All IMO - we're all just guessing. For example, good marketing or an as-yet-undiscovered network effect of cloud LLMs might distort this landscape). |