| ▲ | Someone 2 days ago | |
> I was taken back when I saw what was basically zero recall loss in the real world task of finding related topics By moving the values to a single bit, you’re lumping stuff together that was different before, so I don’t think recall loss would be expected. Also: even if your vector is only 100-dimensional, there already are 2^100 different bit vectors. That’s over 10^30. If your dataset isn’t gigantic and has documents that are even moderately dispersed in that space, the likelihood of having many with the same bit vector isn’t large. | ||
| ▲ | barrkel 2 days ago | parent [-] | |
And if dispersion isn't good, it would be worthwhile running the vectors through another model trained to disperse them. | ||