Remix.run Logo
amelius 8 hours ago

LLM in silicon is the future. It won't be long until you can just plug an LLM chip into your computer and talk to it at 100x the speed of current LLMs. Capability will be lower but their speed will make up for it.

jillesvangurp 6 hours ago | parent | next [-]

You can always delegate sub agents to cloud based infrastructure for things that need more intelligence. But the future indeed is to keep the core interaction loop on the local device always ready for your input.

A lot of stuff that we ask of these models isn't all that hard. Summarize this, parse that, call this tool, look that up, etc. 99.999% really isn't about implementing complex algorithms, solving important math problems, working your way through a benchmark of leet programming exercises, etc. You also really don't need these models to know everything. It's nice if it can hallucinate a decent answer to most questions. But the smarter way is to look up the right answer and then summarize it. Good enough goes a long way. Speed and latency are becoming a key selling point. You need enough capability locally to know when to escalate to something slower and more costly.

This will drive an overdue increase in memory size of phones and laptops. Laptops especially have been stuck at the same common base level of 8-16GB for about 15 years now. Apple still sells laptops with just 8GB (their new Neo). I had a 16 GB mac book pro in 2012. At the time that wasn't even that special. My current one has 48GB; enough for some of the nicer models. You can get as much as 256GB today.

zozbot234 6 hours ago | parent [-]

> This will drive an overdue increase in memory size of phones and laptops.

DRAM costs are still skyrocketing, so no, I don't think so. It's more likely that we'll bring back wear-resistant persistent memory as formerly seen with Intel Optane.

theshrike79 8 hours ago | parent | prev [-]

I'm expecting someone to come up with an LLM version of the Coral USB Accelerator: https://www.coral.ai/products/accelerator

Just plug in a stick in your USB-C port or add an M.2 or PCIe board and you'll get dramatically faster AI inference.

angoragoats 5 hours ago | parent [-]

I think there are drastic differences between computer vision models and LLMs that you’re not considering. LLMs are huge relative to vision models, and require gobs of fast memory. For this reason a little USB dongle isn’t going to cut it.

Put another way, there already exist add-in boards like this, and they’re called GPUs.

amelius 4 hours ago | parent [-]

GPUs are still software programmable.

An "LLM chip" does not need that and so can be much more efficient.