Remix clone Hacker News

new | show | ask | jobs Github

	▲	yieldcrv 5 days ago
		On r/localllama there is someone that got 120B OSS running on 8gb ram and 35 tokens/sec from the CPU (!!) after noticing 120B has a different architecture of only 5B “active” parameters This makes it incredibly cheap to run on existing hardware, consumer off the shelf hardware Its equally as likely that GPT 5 leverages a similar advancement in architecture, which would give them an order of magnitude more use of their existing hardware without being bottlenecked by GPU orders and TSMC
	▲	lelanthran 5 days ago \| parent [-]
		> On r/localllama there is someone that got 120B OSS running on 8gb ram and 35 tokens/sec from the CPU (!!) after noticing 120B has a different architecture of only 5B “active” parameters If anyone else was as interested as I was, here's the link: https://www.reddit.com/r/LocalLLaMA/comments/1mke7ef/120b_ru...