| ▲ | Gigachad 5 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
We still aren't going to be putting 200gb ram on a phone in a couple years to run those local models. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mh- 5 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
A lot of people are making the mistake of noticing that local models have been 12-24 months behind SotA ones for a good portion of the last couple years, and then drawing a dotted line assuming that continues to hold. It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here. Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use. Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge. It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these. edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jurmous 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
We don’t need 200gb of RAM on a phone to run big models. Just 200 GB of storage thanks to Apple’s “LLM in a flash” research. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||