Remix.run Logo
crazygringo 4 hours ago

I think it's fascinating because it seems to be a completely different type of compression.

You can see it in the hair as well. It seems very clear that it is engaging in a kind of texture synthesis.

So it seems to be looking at an area, and capturing the textural quality. And then reproducing that, so the overall effect is the same, but individual fibers or fuzzy bits are randomly generated from scratch.

And so yes, if you zoom in enough, the knitting looks completely wrong because the regular geometric pattern of irregular yarn it is made of has been replaced by a completely irregular pattern of irregular yarn.

In other words, it is essentially hallucination of details on a micro scale but not on a macro scale.

And I think that raises a really interesting philosophical question of what we consider to be valid image reconstruction from lossy compression.

Because on the one hand, this is no different from blurriness or even the kind of blocky JPEG compression we are familiar with. It's just pixels that are wrong. Those blocks don't appear in the original image. The blurriness isn't there in the original image.

But on the other hand, we see blurriness as being somehow more "honest", and we are easily able to recognize that blockiness is an artifact.

Whereas with textural hallucination, it is no longer clear what is being filled in versus what is original, because it's doing such a good job of emulating so many aspects of the original texture.

And it's really hard to say if one approach is better or worse than the other. It's probably more accurate to say that one is more appropriate than the other in different contexts. Like if it is just a normal news photograph, I am perfectly happy with a sharper image because it's not changing anything substantial – it's not changing the face of a world leader or the number of people in the photo. But on the other hand, if I am doing online shopping for shirts and I want to be able to zoom in on the texture, then it's incredibly important that the texture be accurate and not loosely hallucinated.

srean 4 hours ago | parent | next [-]

This is a potential problem in "AI" denoising as well.

These denoising models, the autoencoders more directly so, work by (lossily) mapping the raw input to a very low dimensional representation. The other part generates the desired image back from the low-d representation.

The problem is that nothing, in the vanilla versions, prevent the the low-d version to be a semantics representation such as, Moon, dark hair etc and the generative part to take cues from the semantic representation to a generated sub-image.

The Samsung phone Moon image was likely a result of deliberate choice / company policy, but these things can happen without explicit intent.

thraway54321 4 hours ago | parent | prev [-]

I disagree that it's only on a micro scale. If you look at the picture of the parrots it completely changes the black/white pattern in the face of the red parrot and if you look at the picture of the green bicycle where the luggage rack attaches close to the center of the rear wheel, it's completely mangled, in contrast to the more "blurry" picture where you can clearly see the bolts where it's attached also the rods going from the wheel hub up to the luggage rack also looks very jagged and weird whereas they look fine in the blurry one. There are certainly other errors as well but those where the most jarring I Noticed at a quick glance. I don't think a compression algorithm that does this poorly on cherry picked examples are going to fly when you start throwing real pictures at them. If you are going to screw with the ground truth I bet you could get better results by throwing the blurry pictures in one of those "AI" upscalers.

crazygringo 4 hours ago | parent [-]

I would say all of those examples you are picking are at the micro scale. Obviously it's a somewhat arbitrary division between macro and micro, what you consider to be the macro objects versus what you consider to be the micro details.

And this is also going to depend on the level of compression being chosen. Obviously, the greater the compression, the lesser the fidelity. The lesser the compression, the greater the fidelity.