Remix.run Logo
nerdsniper 4 hours ago

> the GPU could be used to encode h264, and apparently yes, but it's not really worth it compared to CPU.

It depends on what you're going for. If you're trying to do the absolute highest fidelity for archiving a blu-ray disk, AMD Epyc reigns supreme. That's because you need a lot of flexibility to really dial in the quality settings. Pirates over at PassThePopcorn obsess over minute differences in quality that I absolutely cannot notice with my eyes, and I'm glad they do! Their encodings look gorgeous. This quality can't be achieved with the silicon of hardware-accelerated encoders, and due to driver limitations (not silicon limitations) also cannot be achieved by CUDA cores / execution engines / etc on GPUs.

But if you're okay with a small amount of quality loss, the optimum move for highest # of simultaneous encodes or fastest FPS encoding is to skip the CPU and GPU "general compute" entirely - going with hardware accelerated encoding can get you 8-30 1080p simultaneous encodes on a very cheap intel iGPU using QSV/VAAPI encoding. This means using special sections of silicon whose sole purpose is to perform H264/H265/etc encoding, or cropping / scaling / color adjustments ... the "hardware accelerators" I'm talking about are generally present in the CPU/iGPU/GPU/SOC, but are not general purpose - they can't be used for CUDA/ROCm/etc. Either they're being used for your video pipeline specifically, or they're not being used at all.

I'm doing this now for my startup and we've tuned it so it uses 0% of the CPU and 0% of the Render/3D engine of the iGPU (which is the most "general purpose" section of the GPU, leaving those completely free for ML models) and only utilizing the Video Engine and Video Enhance engines.

For something like Frigate NVR, that's perfect. You can support a large # of cameras on cheap hardware and your encoding/streaming tasks don't load any silicon used for YOLO, other than adding to overall thermal limits.

Video encoding is a very deep topic. You need to have benchmarks, you need to understand not just "CPU vs GPU" ... but down to which parts of the GPU you're using. There's an incredible amount of optimization you can do for your specific task if you take the time to truly understand the systems level of your video pipeline.

Aurornis 4 hours ago | parent [-]

> But if you're okay with a small amount of quality loss,

I wouldn't call it a small quality loss. The hardware encoders are tuned for different priorities like live streaming. They have lower quality and/or much higher bitrate.

> If you're trying to do the absolute highest fidelity for archiving a blu-ray disk, AMD Epyc reigns supreme.

You don't need any special CPU to get the highest fidelity as long as you're willing to wait. For archiving purposes any CPU will do, just be prepared to let it run for a long time.

nerdsniper 4 hours ago | parent [-]

> You don't need any special CPU to get the highest fidelity as long as you're willing to wait.

Correct, but Epyc "reigns supreme" for anyone caring about performance / total FPS throughput, which is relevant for anyone who cares about TFA at all - the purpose of using GPU is to "go faster", and that's what Epyc offers for use cases that also care about extreme fidelity.

> I wouldn't call it a small quality loss. The hardware encoders are tuned for different priorities like live streaming. They have lower quality and/or much higher bitrate.

Sure. It absolutely depends on your use case. We're using it for RDP/KVM-type video, so for us the quality loss is indeed quite "small". Our users care more about "can I read the text clearly?" and less about color-banding. The hardware accelerators do a great job with text clarity so for our use-case it's not much of a noticeable quality loss. I will admit the colors are very noticeably distorted, but the shapes are correct and the contrast/sharpness is good.

Using 0% of the CPU and GPU for encoding is a HUGE win that's totally worth it for us - hardware costs stay super low. Using really old bottom of the barrel CPU's for 30+ simultaneous encodes feels like cheating. Hardware-accelerated encoding also provides another massive win by tangibly reducing latency for our users vs CPU/GPU encoding (it's not just the throughput that's improved, each live frame gets through the pipeline faster too).

I wouldn't use COTS hardware accelerators for archiving Blu-Ray videos. Hell I'm not even aware of any COTS hardware accelerators that support HDR ... they probably exist but I've never stumbled across one. But hardware-accelerated encoding really is ideal for a lot of other stuff, especially when you care about CapEx at scale. If you're at the scale of Netflix or YouTube, you can get custom silicon made that can provide ASIC acceleration for any quality you like. That said, they seem to choose to degrade video quality to save money all the way to the point that 10-20% of their users hate the quality (myself included, quality is one of the primary reasons I use PassThePopcorn instead of the legal streaming services), but that's a business choice, not a technical limitation of ASIC acceleration (that's if you have the scale to pay for custom silicon...COTS solutions absolutely DO have a noticeable quality loss, as you argue).

Aurornis 4 hours ago | parent [-]

> We're using it for RDP/KVM-type video, so for us the quality loss is indeed quite "small". Our users care more about "can I read the text clearly?" and less about color-banding. The hardware accelerators do a great job with text clarity so for our use-case it's not much of a noticeable quality loss.

This is a perfect use case for hardware video acceleration.

The hardware encoder blocks are great for anything live streaming related. The video they produce uses a lot higher bitrate and has lower quality than what you could get with a CPU encoder, but if doing a lot of real-time encodes is important then they deliver.