Remix.run Logo
HarHarVeryFunny 5 days ago

Well, there are use cases for lossy compression as well as non-lossy, and nobody is saying they are the same. If you really need to heavily compress to reduce file size or transmission bandwidth then you'll likely need to use a lossy CODEC, so the question then becomes how can you minimize the reduction in perceived quality of whatever you are compressing (photos, video, audio), which comes down to how these various human sensory/perceptual systems work.

For vision we are much more sensitive to large scale detail (corresponding to low frequency FFT components) than fine scale detail (corresponding to high frequency components), so given the goal of minimizing reduction in perceived quality this is an obvious place to start - throw away some of that fine detail (highest frequency FFT components), and it may not even be noticeable at all if you are throwing away detail at a higher level of resolution than we are able to perceive.

It turns out that human vision is more sensitive to brightness than color (due to numbers of retinal rods vs cones, etc), so compression can also take advantage of that to minimize perceptual degradation, which is what JPEG does - first convert the image from RGB to YUV color space, where the Y component corresponds to brightness and the U,V components carry the color information, then more heavily compress the color information than brightness by separately applying FFT (actually DCT) to each of the Y,U,V components and throwing away more high frequency (fine detail) color information than brightness.

But, yeah, there is no magic and lossy compression is certainly going to be increasingly noticeable the more heavily you compress.

astrange 4 days ago | parent [-]

> large scale detail (corresponding to low frequency FFT components)

This isn't true in practice - images are not bandlimited like audio so there aren't really visual elements of images corresponding to low frequency cosine waves. That's why the lowest frequency DCT coefficient in a JPEG image is 16x16 pixels, which is hardly large scale.

But you do quantize all components of the DCT transform, not just the highest ones.

Actually in the default JPEG quantization matrix it's the coefficient to the upper-left of the last one that gets the most quantization: https://en.wikipedia.org/wiki/Quantization_(image_processing...

HarHarVeryFunny 4 days ago | parent [-]

Sure, but quantization is just another level of lossiness once you've already decided what information to throw away.

In terms of understanding how JPEG compression works, and how it relates to human perception, I'd say that in order of importance it's:

1) Throw away fine detail by discarding high frequency components

2) More heavily compress/discard color than brightness detail (using YUV)

3) Quantize the frequency components you are retaining

astrange 2 days ago | parent [-]

> 1) Throw away fine detail by discarding high frequency components

The reason it works is that fine detail is almost completely correlated across colors, so if you only keep the Y plane at full resolution it still stores it.

You couldn't just throw it out in RGB space because eg text would be unreadable.