I think it’s mostly because they are incentivised to answer verbatim as medicine students and not with their own understanding. RL methods change that.