Remix clone Hacker News

new | show | ask | jobs Github

	▲	int_19h 3 hours ago
		A 80B MoE model with 3B params per activation is not a competent model regardless of what their cherry-picked benchmarks say. This reminds me of back when every other llama-7b finetune was claiming to be "GPT-4 quality".