| ▲ | xandrius 5 hours ago | |
I think people are misunderstanding reward functions and LLMs. LLMs don't actually have a reward system like some other ML models. | ||
| ▲ | storus 4 hours ago | parent [-] | |
They are trained with one, and when you look at DPO you can say they contain an implicit one as well. | ||