Remix clone Hacker News

new | show | ask | jobs Github

	▲	CuriouslyC 9 hours ago
		More data isn't automatically better. You're trying to build the most accurate model of the "true" latent space (estimated from user preference/computational oracles) possible. More data can give you more coverage of the latent space, it can smooth out your estimate of it, and it can let you bake more knowledge in (TBH this is low value though, freshness is a problem). If you add more data that isn't covering a new part of the latent space the value quickly goes to zero as your redundancy increases. Also, you have to be careful when you add data that you aren't giving the model ineffective biases.