▲ | nneonneo 5 days ago | |
I mean, even back in 2021 the Clip model was getting fooled by text overlaid onto images: https://www.theguardian.com/technology/2021/mar/08/typograph... That article shows a classic example of an apple being classified as 85% Granny Smith, but taping a handwritten label in front saying "iPod" makes it classified as 99.7% iPod. | ||
▲ | lupire 4 days ago | parent [-] | |
The handwritten label was by far the dominant aspect of the "iPod" image. The only mildly interesting aspect of that attack is a reminder that tokenizing systems are bad at distinguishing a thing (iPod) from a refernce to that thing (the text "iPod"). The apple has nothing to do with that, and it's bizarre that the researchers failed to understand it. |