Remix clone Hacker News

new | show | ask | jobs Github

	▲	sosodev 4 hours ago
		Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+ I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.