▲ | api 9 days ago | |
Prices are still coming down. Assuming that keeps happening we will have laptops with enough RAM in the sub-2k range in 5 years. Question is whether models will keep getting bigger. If useful model sizes plateau eventually a good model becomes something at least many people can easily run locally. If models keep usefully growing this doesn’t happen. The largest ones I see are in the 405g range which quantized fits in 256g RAM. Long term I expect custom hardware accelerators designed specifically for LLMs to show up, basically an ASIC. If those got affordable I could see little USB-C accelerator boxes being under $1k able to run huge LLMs fast and with less power. GPUs are most efficient for batch inference which lends itself to hosting not local use. What I mean is a lighter chip made to run small or single batch inference very fast using less power. The bottleneck there is memory bandwidth so I suspect fast RAM would be most of the cost of such a device. Small or single batch inference is memory bandwidth bound. | ||
▲ | m-s-y 9 days ago | parent [-] | |
GPUs are already effectively ASICs for the math that runs both 3D scenes and LLMs, no? |