Remix clone Hacker News

new | show | ask | jobs Github

	▲	littlestymaar 3 hours ago
		So, it's D-Flash but at each transformer layer and share the KV cache of the original model? Very smart!