Remix clone Hacker News

new | show | ask | jobs Github

	▲	hydrogen7800 9 hours ago
		Right, that success story is only because there was "organic" (for lack of a better term) information from an original source. What happens when all information is nth generation AI feedback with all links to the original source lost? Edit: A question from AI/LLM ignorance- Can the source database for an LLM be one-way, in that it does not contain output from itself, or other LLMs? I can imagine a quarantined database used for specific applications that remains curated, but this seems impossible on the open internet.
	▲	bigthymer 8 hours ago \| parent \| next [-]
		> Can the source database for an LLM be one-way, in that it does not contain output from itself, or other LLMs? I think, for public internet data, we can only be reasonably confident for information before the big release of ChatGPT.
	▲	nsvd2 5 hours ago \| parent \| prev \| next [-]
		Yes, people have likened pre-LLM Internet content to low-background steel. If in the hypothetical future the continual learning problem gets solved, the AI could just learn from the real world instead of publications and retain that data.
	▲	nprateem 2 hours ago \| parent \| prev \| next [-]
		One reason why Google made that algorithm to watermark AI output
	▲	black_puppydog 9 hours ago \| parent \| prev [-]
		That's exactly why text written before the first LLMs has a premium on it these days. So no, all major models suffer from slop in their training data.