| ▲ | fl4regun 3 hours ago | |
I agree with you, this is a huge concern, and we are still in an age where most content on the internet isn't ai generated yet. What about 10 years from now? We have many instances of people writing posts on reddit or uploading videos and blogs using AI generated text. What happens when that is a significant percentage of content? I recently saw a video discussing a researcher who published a fake scientific article about a fictitious disease, with bogus author names, even a warning IN the article itself that stated "This is not a real disease, this article is not real" (paraphrasing) but still AI ended up picking up this article and serving information from it as if it was a real disease. It even got cited in papers (which were later redacted of course), but the fact those papers got published in the first place is a serious issue. | ||
| ▲ | amluto 2 hours ago | parent [-] | |
> I recently saw a video discussing a researcher who published a fake scientific article about a fictitious disease, with bogus author names, even a warning IN the article itself that stated "This is not a real disease, this article is not real" (paraphrasing) but still AI ended up picking up this article and serving information from it as if it was a real disease. Isn’t a lot of pretraining done by chopping sources up into short-context-window-sized pieces and then shoving them into the SGD process? The AI-in-training could be entirely incapable of correlating the beginning with the end of the article in its development of its supposed knowledge base. | ||