I've found that if I tell a judge that the answer came from a small and weak local LLM, it will pick the answer apart brutally...but since I have not done this systematically, I dont know how well it generalizes past my vibes.

Anyone else fell like if you can trick the LLM into a mode where it "feels" superior, it will act the asshole very well?

▲

fridder 6 hours ago | parent [-]

Yeah. I usually do this by telling it to be adversarial and find gaps and holes. Not fool proof but it does seem to increase the quality. It has helped when using local models in particular.

	▲	SubiculumCode 5 hours ago \| parent [-]
		Yeah, you have to shortcut the RL-trained people pleasing