I've had a lot of misconceptions that I had to contend with over the years myself as well. Maybe this thread is a good opportunity to air the biggest one of those. Additionally, I'll touch on subbing at the end, since the post specifically calls it out.

My biggest misconception, bar none, was around what a codec is exactly, and how well specified they are. I'd keep hearing downright mythical sounding claims, such as how different hardware and software encoders, and even decoders, produce different quality outputs.

This sounded absolutely mental to me. I thought that when someone said AVC / H.264, then there was some specification somewhere, that was then implemented, and that's it. I could not for the life of me even begin to fathom where differences in quality might seep in. Chief of this was when somebody claimed using single threaded encoding instead of multi threaded encoding was superior. I legitimately considered I was being messed with, or that the person I was talking to simply didn't know what they were talking about.

My initial thoughts on this were that okay, maybe there's a specification, and the various codec implementations just "creatively interpret" these. This made intuitive sense to me because "de jure" and "de facto" distinctions are immensely common in the real world, be it for laws, standards, what have you. So I'd start differentiating and going "okay so this is H.264 but <implementation name>". I was pretty happy with this, but eventually, something felt off enough to make me start digging again.

And then, not even a very long time ago, the mystery unraveled. What the various codec specifications actually describe, and what these codecs actually "are", is the on-disk bitstream format, and how to decode it. Just the decode. Never the encode. This applies to video, image, and sound formats; all lossy media formats. Except for telephony, all these codecs only ever specify the end result and how to decode that, but not the way to get there.

And so suddenly, the differences between implementations made sense. It isn't that they're flaunting the standard: for the encoding step, there simply isn't one. The various codec implementations are to compete on finding the "best" way to compress information to the same cross-compatibly decode-able bitstream. It is the individual encoders' responsibility to craft a so-called psychovisual or psychoacoustic model, and then build a compute-efficient encoder that can get you the most bang for the buck. This is how you get differences between different hardware and software encoders, and how you can get differences even between single and multi-threaded codepaths of the same encoder. Some of the approaches they chose might simply not work or work well with multi threading.

One question that escaped me then was how can e.g. "HEVC / H.265" be "more optimal" than "AVC / H.264" if all these standards define is the end result and how to decode that end result. The answer is actually kinda trivial: more features. Literally just more knobs to tweak. These of course introduce some overhead, so the question becomes, can you reliably beat this overhead to achieve parity, or gain efficiency. The OP claims this is not a foregone conclusion, but doesn't substantiate. In my anecdotal experience, it is: parity or even efficiency gain is pretty much guaranteed.

Finally, I mentioned differences between decoder output quality. That is a bit more boring. It is usually a matter of fault tolerance, and indeed, standards violations, such as supporting a 10 bit format in H.264 when the standard (supposedly, never checked) only specifies 8-bit. And of course, just basic incorrectness / bugs.

Regarding subbing then, unless you're burning in subs (called hard-subs), all this malarkey about encoding doesn't actually matter. The only thing you really need to know about is subtitle formats and media containers. OP's writing is not really for you.

▲

dylan604 5 hours ago | parent | next [-]

I was a DVD programmer for 10 years. There was a defined DVD spec. The problem is that not every DVD device adhered to the spec. Specs contain words like shall/must and other words that can be misinterpreted, and then you have people that build MVP as a product that do not worry about the more advanced portion of the spec.

As a specific example, the DVD software had a random feature that could be used. There was one brand of player that had a preset list of random numbers so that every time you played a disc that used random, the random would be the exact same every time. This made designing DVD-Video games "interesting" as not all players behaved the same.

This was when I first became aware that just because there's a spec doesn't mean you can count on the spec being followed in the same way everywhere. As you mentioned, video decoders also play fast and loose with specs. That's why some players cannot decode the 10-bit encodes as that's an "advanced" feature. Some players could not decode all of the profiles/levels a codec could use according to the spec. Apple's QTPlayer could not decode the more advanced profiles/levels just to show that it's not "small" devs making limited decoders.

▲

memoriuaysj 5 hours ago | parent | prev [-]

what if I told you the same issue is true for lossless plain compression like .zip files

the compressor (encoder) decides exactly how to pack the data, it's not deterministic, you can do a better job at it or a worse one

which is why we have "better" zlib implementations which compress more tightly

▲

perching_aix 5 hours ago | parent [-]

Drives me crazy but I'm glad to learn of it :D

Makes a lot of sense in retrospect, to the extent it bothers me I haven't figured it out myself earlier.

▲

memoriuaysj 5 hours ago | parent [-]

this is exactly what "higher" compression levels do (among other things like bigger dictionary) - they try harder, more iterations, to find the optimum combination of available knobs for a particular chunk of data.

▲

perching_aix 4 hours ago | parent [-]

Yes, that much was always clear. I just always thought the way these software go about finding those combinations was also standardized on a high level rather than proprietary to each implementation. It is a fairly recent development for me to realize that the various encoder options and presets are specific to the encoder, not the format (and now, that the same is true for lossless formats too).

	▲	memoriuaysj 4 hours ago \| parent [-]
		for video there is another constraint - time hardware encoders (like the ones in GPUs) typically work realtime-ish, so they do minimal exploration of encoding space you also have the one-pass/two-pass thing which is key for unlocking high quality compression