Remix.run Logo
ManlyBread 3 days ago

>In fact, this paper found that more than that, it thinks American.

I think that's because it seems to be primarily trained on reddit and therefore mirrors everything reddit stands for. Not a good thing considering just how overrun the site is with bots and political activists of all kinds.

rollcat 3 days ago | parent | next [-]

You're absolutely right! Social media like Reddit are overrun with bots, sycophants, and trolls trying to provoke reactions by engaging in controversial topics. This forms echo chambers, which is a sub-par source for training data, and reflects those biases in LLM responses.

TimByte 3 days ago | parent | prev [-]

I wonder how much of that actually survives token filtering during training