Remix clone Hacker News

new | show | ask | jobs Github

	▲	awakeasleep 2 days ago
		Thats not a real rebuttal. First, in the pre training stage humans curate and filter the data thats actually used for training. Then in the fine tuning stage people write ideal examples to teach task performance Then there is reinforcement learning from human feedback RLHF where people rank multiple variations of the answer an AI gives, and thats part of the reinforcement loop So there is really quite a bit of human effort and direction that goes into preventing the garbage-in garbage-out type situation you're referring to