Remix clone Hacker News

new | show | ask | jobs Github

	▲	mudkipdev a day ago
		It simply means the tokenizer's training corpus may have included a massive amount of German literature or accidentally oversampled a web page where that word was frequently repeated. Look up "glitch tokens" to learn more.