Remix clone Hacker News

new | show | ask | jobs Github

	▲	mrweasel 6 hours ago
		> There must be a ton of companies with very large document collections at this point See, I don't think there is, I don't think they want that expense. It's basically the Linus Torvalds philosophy of data storage, if it's on the Internet, I don't need a backup. While I have absolutely no proof of this, I'd guess that many AI companies just crawl the Internet constantly, never saving any of the data. We're seeing some of these scrapers go to great length attempting to circumvent any and all forms of caching, they aren't interested in having a two week old copy of anything.
	▲	n1xis10t 26 minutes ago \| parent [-]
		Could be. Can you train a model without saving things though?