Remix.run Logo
pixl97 5 days ago

>find small perturbations that are undetectable to humans but produce a large change in model behavior.

What artists don't realize by this they are just improving the models relative to human capabilities. The adversarial techniques like, for example making a stop sign look like something else, well likely be weeded out of the model by a convergence of model performance to average or above average human performance.

pogue 5 days ago | parent [-]

How long until somebody comes up with another reCAPTCHA type system that forces users to click on images to identify them but that data is then used to verify training data for LLMs? (assuming this isn't happening already)

alpaca128 4 days ago | parent [-]

Google’s captchas have always been used for AI training as far as I know. For example the early versions where you had to type in two displayed words were used for Google’s book scanning program.

pogue 4 days ago | parent [-]

Well, the original purpose was to do OCR for things like the NYTs archives and other libraries. The part where you identify road signs & traffic lights was supposedly to train self driving cars. Now, it's apparently just more analytics & tracking for Google to sell you things. [1]

But, since LLM is so error prone & AI companies don't seem to want to pay humans to verify either the data being input into LLM training is valid, or the output is accurate, something like a forced CAPTCHA to be used for verifying correct LLM data by unpaid labor.

It's just a dystopian thought I had. I probably shouldn't have said it outloud (it might give them ideas).

[1] https://www.techradar.com/pro/security/a-tracking-cookie-far...

Suppafly 4 days ago | parent [-]

>Well, the original purpose was to do OCR for things like the NYTs archives and other libraries. The part where you identify road signs & traffic lights was supposedly to train self driving cars. Now, it's apparently just more analytics & tracking for Google to sell you things.

You seem hung up on the idea of the original purpose being one specific thing. The original purpose was to create a dataset to train AIs, the first adopters were OCR programs and such, but it's not like it was created to only be used for that one thing.