Remix clone Hacker News

new | show | ask | jobs Github

	▲	xp84 6 days ago
		Somehow I am not convinced that this is so true. Most of the BS on the Internet is on social media (and maybe, among older data, on the old forums which existed mainly for social reasons and not to explore and further factual knowledge). Even Reddit comments has far more reality-focused material on the whole than it does shitposting and rudeness. I don't think any of these big models were trained at all on 4chan, youtube comments, instagram comments, Twitter, etc. Or even Wikipedia Talk pages. It just wouldn't add anything useful to train on that garbage. Overall on the other hand, most stackoverflow pages are objective, and to the extent there are suboptimal things, there is eventually a person explaining why a given answer is suboptimal. So I accept that some UGC went into the model, and that there's a reason to do so, but I believe it's so broad as "The Internet" represented there.