Remix.run Logo
dev_l1x_be 11 hours ago

If you train an LLM on reddit/tumblr would you consider that tweaked to certain political ideas?

dalemhurley 10 hours ago | parent [-]

Worse. It is trained to the most extreme and loudest views. The average punter isn’t posting “yeah…nah…look I don’t like it but sure I see the nuances and fair is fair”.

To make it worse, those who do focus on nuance and complexity, get little attention and engagement, so the LLM ignores them.

intended 5 hours ago | parent [-]

That’s essentially true of the whole Internet.

All the content is derived from that which is the most capable of surviving and being reproduced.

So by default the content being created is going to be click bait, attention grabbing content.

I’m pretty sure the training data is adjusted to counter this drift, but that means there’s no LLM that isn’t skewed.