Remix.run Logo
parentheses 2 days ago

I think a large issue at play here is post training. Pre training models the original distribution of input data. RL techniques tweak the models to "behave". This step changes how the models "think" in a fundamental way .