Remix clone Hacker News

new | show | ask | jobs Github

	▲	bcjdjsndon 3 days ago
		Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture.
	▲	bcjdjsndon 3 days ago \| parent [-]
		And a lot of these improvements are really just classic automation or chaining together yet more transformer architectures, to fix issues the transformer architecture creates in the first place (hallucinations, limited context)