Remix clone Hacker News

new | show | ask | jobs Github

	▲	danieldk 5 days ago
		Until we got highly optimized decoder implementations, decoders for prefill were often even implemented by using the same implementation as an encoder, but logit-masking inputs using a causal mask before the attention softmax so that tokens could not attend to future tokens.