Remix clone Hacker News

new | show | ask | jobs Github

	▲	Someone 2 days ago
		> I was taken back when I saw what was basically zero recall loss in the real world task of finding related topics By moving the values to a single bit, you’re lumping stuff together that was different before, so I don’t think recall loss would be expected. Also: even if your vector is only 100-dimensional, there already are 2^100 different bit vectors. That’s over 10^30. If your dataset isn’t gigantic and has documents that are even moderately dispersed in that space, the likelihood of having many with the same bit vector isn’t large.
	▲	barrkel 2 days ago \| parent [-]
		And if dispersion isn't good, it would be worthwhile running the vectors through another model trained to disperse them.