Remix.run Logo
ben_w 2 days ago

We've already got distilled down versions of models designed to fit on consumer-sized devices, they are definitely not as performant as the bigger models.

But the models are RAM limited not compute limited, and there's no reason consumer devices need to have their current RAM limits. Get 256 GB of RAM in your phone and an LLM may drain the battery in 15 minutes, and I have no idea about the bus bandwidth, but the NPU (e.g. Neural Engine in Apple SoCs for the last few years) is already enough for the compute part of the problem.