Remix clone Hacker News

new | show | ask | jobs Github

	▲	gpapilion 4 hours ago
		Realistically groq is a great solution but has near impossible requirements for deployment. Just look at how many adapters you need to meet the memory requirements of a small llm. SRAM is fast but small. I would guess their interconnect technology is what NVIDIA wants. You need something like 75 adapters for an 8b parameter model they had some really interesting tech to make the accelerator to accelerator communication work and scale. They were able to do that well before nvl 72 and they scale to hundreds of adapters since large models require more adapters still. We will know in a few months.