We can trust the feedback we give it based on the output it provides.

What kind of feedback are you giving? What's the reward function?

	▲	jack_pp an hour ago \| parent [-]
		Right now, no feedback since I don't run this system but our workflows could change to accommodate it