Remix clone Hacker News

new | show | ask | jobs Github

	▲	mattnewton 13 hours ago
		Maybe I was being imprecise, but I’m not sure what you mean by “not how LLMs work” - discovering patterns of how humans write is exactly the signal they are trained against. Either explicitly curated like SFT or coaxed out during RLHF, no? It could even have been picked up in pretraining and then rewarded during rlhf when the output domain was being refined; I haven’t used enough LLMs before post training to know what step it usually becomes noticeable.