Remix clone Hacker News

new | show | ask | jobs Github

	▲	potatolicious 3 days ago
		> "Whether it's a naturally emergent phenomenon in LLMs or specifically a result of its corporate environment, I'd like to know the answer." I heavily suspect this is down to the RLHF step. The conversations the model is trained on provide the "voice" of the model, and I suspect the sycophancy is (mostly, the base model is always there) comes in through that vector. As for why the RLHF data is sycophantic, I suspect that a lot of it is because the data is human-rated, and humans like sycophancy (or at least, the humans that did the rating did). On the aggregate human raters ranked sycophantic responses higher than non-sycophantic responses. Given a large enough set of this data you'll cover pretty much every kind of sycophancy. The systems are (rarely) instructed to be sycophantic, intentionally or otherwise, but like all things ML human biases are baked in by the data.