Remix clone Hacker News

new | show | ask | jobs Github

	▲	vidarh 5 hours ago
		I worked on a project that did fine tuning and RLHF[1] for a major provider, and you would not believe just how utterly broken a large proportion of the prompts (from real users) were. And the project rules required practically reading tea leaves to divine how to give the best response even to prompts that were not remotely coherent human language. [1] Reinforcement learning from human feedback; basically participants got two model responses and had to judge them on multiple criteria relative to the prompt
	▲	redman25 an hour ago \| parent [-]
		I feel like the right response for those situations is to start asking questions of the user. It’s what a human would do if they did not understand.