Remix.run Logo
GaggiX 4 hours ago

From what I remember Glaze is using some small CLIP model and LPIPS (based on VGG) for their adversarial loss, that's why it's so ineffective to large, better trained model.

It use SD to do a style transfer on the image using image-to-image, then it use gradient descent on the image itself to lower the difference between CLIP embeddings of the original and style transfer image + trying to maintain LPIPS, then every step is normalized to not exceed a certain threshold from the original image.

So essentially it's an adversarial attack against a small CLIP model, even though today's models are much robust than that.