Remix.run Logo
astrange 5 days ago

> A compression algorithm can then remove high-frequency information, which corresponds to small details, without drastically changing how the image looks to the human eye.

I slightly object to this. Removing small details = blurring the image, which is actually quite noticeable.

For some reason everyone really wants to assume this is true, so for the longest time people would invent new codecs that were prone to this (in particular wavelet-based ones like JPEG-2000 and Dirac) and then nobody would use them because they were blurry. I think this is because it's easy to give up on actually looking at the results of your work and instead use a statistic like PSNR, which turns out to be easy to cheat.

HarHarVeryFunny 5 days ago | parent | next [-]

Well, there are use cases for lossy compression as well as non-lossy, and nobody is saying they are the same. If you really need to heavily compress to reduce file size or transmission bandwidth then you'll likely need to use a lossy CODEC, so the question then becomes how can you minimize the reduction in perceived quality of whatever you are compressing (photos, video, audio), which comes down to how these various human sensory/perceptual systems work.

For vision we are much more sensitive to large scale detail (corresponding to low frequency FFT components) than fine scale detail (corresponding to high frequency components), so given the goal of minimizing reduction in perceived quality this is an obvious place to start - throw away some of that fine detail (highest frequency FFT components), and it may not even be noticeable at all if you are throwing away detail at a higher level of resolution than we are able to perceive.

It turns out that human vision is more sensitive to brightness than color (due to numbers of retinal rods vs cones, etc), so compression can also take advantage of that to minimize perceptual degradation, which is what JPEG does - first convert the image from RGB to YUV color space, where the Y component corresponds to brightness and the U,V components carry the color information, then more heavily compress the color information than brightness by separately applying FFT (actually DCT) to each of the Y,U,V components and throwing away more high frequency (fine detail) color information than brightness.

But, yeah, there is no magic and lossy compression is certainly going to be increasingly noticeable the more heavily you compress.

astrange 4 days ago | parent [-]

> large scale detail (corresponding to low frequency FFT components)

This isn't true in practice - images are not bandlimited like audio so there aren't really visual elements of images corresponding to low frequency cosine waves. That's why the lowest frequency DCT coefficient in a JPEG image is 16x16 pixels, which is hardly large scale.

But you do quantize all components of the DCT transform, not just the highest ones.

Actually in the default JPEG quantization matrix it's the coefficient to the upper-left of the last one that gets the most quantization: https://en.wikipedia.org/wiki/Quantization_(image_processing...

HarHarVeryFunny 4 days ago | parent [-]

Sure, but quantization is just another level of lossiness once you've already decided what information to throw away.

In terms of understanding how JPEG compression works, and how it relates to human perception, I'd say that in order of importance it's:

1) Throw away fine detail by discarding high frequency components

2) More heavily compress/discard color than brightness detail (using YUV)

3) Quantize the frequency components you are retaining

astrange 2 days ago | parent [-]

> 1) Throw away fine detail by discarding high frequency components

The reason it works is that fine detail is almost completely correlated across colors, so if you only keep the Y plane at full resolution it still stores it.

You couldn't just throw it out in RGB space because eg text would be unreadable.

goalieca 5 days ago | parent | prev [-]

You might be surprised to learn that videos and many images are broken down into brightness + color vectors. Brightness is typically done near the resolution of the image (eg 4K) but color is often 1/4 of that. That’s literally blurring the picture.

astrange 5 days ago | parent [-]

…Why would you write a comment like this assuming I don't know what 4:2:0 is when I know what a wavelet is?

It doesn't have the same effect because the original high frequency details are correlated and so they're preserved in the Y channel.