| ▲ | vintermann 8 hours ago | |
One thing I haven't seen anyone bring up yet in this thread, is that there's a big risk of leakage. If even big image models had CSAM sneak into their training material, how can we trust data from our time hasn't snuck into these historical models? I've used Google books a lot in the past, and Google's time-filtering feature in searches too. Not to mention Spotify's search features targeting date of production. All had huge temporal mislabeling problems. | ||
| ▲ | DGoettlich 30 minutes ago | parent [-] | |
[dead] | ||