Remix clone Hacker News

new | show | ask | jobs Github

	▲	api 9 days ago
		Prices are still coming down. Assuming that keeps happening we will have laptops with enough RAM in the sub-2k range in 5 years. Question is whether models will keep getting bigger. If useful model sizes plateau eventually a good model becomes something at least many people can easily run locally. If models keep usefully growing this doesn’t happen. The largest ones I see are in the 405g range which quantized fits in 256g RAM. Long term I expect custom hardware accelerators designed specifically for LLMs to show up, basically an ASIC. If those got affordable I could see little USB-C accelerator boxes being under $1k able to run huge LLMs fast and with less power. GPUs are most efficient for batch inference which lends itself to hosting not local use. What I mean is a lighter chip made to run small or single batch inference very fast using less power. The bottleneck there is memory bandwidth so I suspect fast RAM would be most of the cost of such a device. Small or single batch inference is memory bandwidth bound.
	▲	m-s-y 9 days ago \| parent [-]
		GPUs are already effectively ASICs for the math that runs both 3D scenes and LLMs, no?