▲ | kjellsbells 5 days ago | |
> Meta has a massive corpus of posts, comments, interactions, etc to train AI I question whether the corpus is of particularly high quality and therefore valuable source data to train on. On the one hand: 20+ years of posts. In hundreds of languages (very useful to counteract the extreme English-centricity of most AI today). On the other hand: 15+ years of those posts are clustered on a tiny number of topics, like politics and selling marketplace items. Not very useful unless you are building RagebaitAI I suppose. Reddit's data would seem to be far more valuable on that basis. |