Remix clone Hacker News

new | show | ask | jobs Github

	▲	Perz1val 5 hours ago
		But it can't, we see models get larger and larger and larger models perform better. <Thinking> made such huge improvements, because it makes more text for the language model to process. Cavemanising (lossy compression) the output does it to the input as well.
	▲	spacemanspiff01 4 hours ago \| parent [-]
		but some tokens are not really needed? This is probably bad because it is mismatched with training set, but if you trained a model on a dataset removing all prepositions (or whatever caveman speak is), would you have a performance degradation compared to the same model trained on the same dataset without the caveman translation?