| ▲ | wincy 12 hours ago | |
You’d just run every picture through CLIP, essentially you run an image generator backwards. Instead of text to image like most end users use when using something like stable diffusion (been awhile since I’ve done this), it can do the exact opposite and generate tokens (just words in this case) to describe the input image. I’d guess famous characters like Bart and Marge and other Simpsons characters would likely be known by the tokenizer so it’d be pretty easy. So then you’d be able to guess. Feel free to correct me on small details if anyone has this more fresh in their mind but I’m roughly correct here. | ||