▲ | ManlyBread 3 days ago | |
>In fact, this paper found that more than that, it thinks American. I think that's because it seems to be primarily trained on reddit and therefore mirrors everything reddit stands for. Not a good thing considering just how overrun the site is with bots and political activists of all kinds. | ||
▲ | rollcat 3 days ago | parent | next [-] | |
You're absolutely right! Social media like Reddit are overrun with bots, sycophants, and trolls trying to provoke reactions by engaging in controversial topics. This forms echo chambers, which is a sub-par source for training data, and reflects those biases in LLM responses. | ||
▲ | TimByte 3 days ago | parent | prev [-] | |
I wonder how much of that actually survives token filtering during training |