Remix clone Hacker News

new | show | ask | jobs Github

	▲	boutell 5 hours ago
		The speed is ridiunkulous. No doubt. The quantization looks pretty severe, which could make the comparison chart misleading. But I tried a trick question suggested by Claude and got nearly identical results in regular ollama and with the chatbot. And quantization to 3 or 4 bits still would not get you that HOLY CRAP WTF speed on other hardware! This is a very impressive proof of concept. If they can deliver that medium-sized model they're talking about... if they can mass produce these... I notice you can't order one, so far.
	▲	Normal_gaussian 5 hours ago \| parent [-]
		I doubt many of us will be able to order one for a long while. There is a significant number of existing datacentre and enterprise use-cases that will pay a premium for this. Additionally LLMs have been tested, found valuable in benchmarks, but not used for a large number of domains due to speed and cost limitations. These spaces will eat up these chips very quickly.