Remix clone Hacker News

new | show | ask | jobs Github

	▲	zozbot234 10 hours ago
		Shouldn't FlashAttention address the quadratic increase in memory footprint wrt. fine-tuning/training? I'm also pretty sure that it does not apply to pure inference due to how KV-caching works.