| ▲ | Der_Einzige 4 hours ago | |
This is extremely obvious to anyone whose read other papers. There's tons of papers showing LLMs prefer their own outputs. It's a big enough problem that LLM-as-judge has to be a different LLM from the LLM you are testing in papers. | ||