Remix.run Logo
doctorpangloss 8 hours ago

What is the use case? Okay, ultra low latency streaming. That is good. But. If you are sending the frames via some protocol over the network, like WebRTC, it will be touching the CPU anyway. Software encoding of 4K h264 is real time on a single thread on 65w, decade old CPUs, with low latency. The CPU encoders are much better quality and more flexible. So it's very difficult to justify the level of complexity needed for hardware video encoding. Absolutely no need for it for TV streaming for example. But people keep being obsessed with it who have no need for it.

IMO vendors should stop reinventing hardware video encoding and instead assign the programmer time to making libwebrtc and libvpx better suit their particular use case.

chillfox 7 hours ago | parent | next [-]

The article explains it. This is not for streaming over the web, but for editing professional grade video on consumer hardware.

doctorpangloss 7 hours ago | parent [-]

davinci resolve is the only commercial NLE with any kind of vulkan support, and it is experimental

prores decodes faster than realtime single threaded on a decade old CPU too

it doesn't make sense. it's much different with say, a video game, where a texture will be loaded once into VRAM, and then yes, all the work will be done on the GPU. a video will have CPU IO every frame, you are still doing a ton of CPU work. i don't know why people are talking about power efficiency, in a pro editing context, your CPU will be very, very busy with these IO threads, including and especially in ffmpeg with hardware encoding/decoding nonetheless. it doesn't look anything like a video game workload which is what this stack is designed for.

pandaforce 6 hours ago | parent | next [-]

6k ProRes streams that consumer cameras record in are still too heavy for modern CPUs to decode in realtime. Not to mention 12k ProRes that professional cameras output.

lostmsu 7 hours ago | parent | prev [-]

That reduces power consumption. So should improve battery life of laptops and help environment a little.

nerdsniper 4 hours ago | parent | prev | next [-]

> If you are sending the frames via some protocol over the network, like WebRTC, it will be touching the CPU anyway. Software encoding of 4K h264 is real time on a single thread on 65w, decade old CPUs, with low latency.

This is valid for a single stream, but the equation changes when you're trying to squeeze the highest # of simultaneous streams into the least amount of CapEx possible. Sure, you still have to transfer it to the CPU cache just before you send it over WebRTC/HTTP/whatever, but you unlock a lot of capacity by using all the rest of the silicon as much as you can. Being able to use a budget/midrange GPU instead of a high-end ultra-high-core-count CPU could make a big difference to a business with the right use-case.

That said, TFA doesn't seem to be targeting that kind of high stream density use-case either. I don't think e.g. Frigate NVR users are going to switch to any of the mentioned technologies in this blog post.

pandaforce 7 hours ago | parent | prev | next [-]

The article explicitly mentions that mainstream codecs like H264 are not the target. This is for very high bitrate high resolution professional codecs.

jpc0 7 hours ago | parent | prev | next [-]

I'm not entirely sure that this is true.

I haven't actually looked into this but it might not be the realm of possibility. But you are generating a frame on GPU, if you can also encode it there, either with nvenc or vulkan doesn't matter. Then DMA the to the nic while just using the CPU to process the packet headers, assuming that cannot also be handled in the GPU/nic

nerdsniper 4 hours ago | parent [-]

You can also often DMA video coming in through peripherals to get it straight into the GPU, skipping the CPU.

eptcyka 8 hours ago | parent | prev | next [-]

It will be more energy efficient. And the CPU is free to jit half a gig of javascript in the mean time.

temp0826 7 hours ago | parent [-]

It's hugely more efficient, if you're on a battery powered device it could mean hours more of play time. It's pretty insane just how much better it is (I go through a bit of extra effort to make sure it's working for me, hw decoding isn't includes in some distros).

hrmtst93837 3 hours ago | parent | prev | next [-]

If the frames already live on the GPU, pulling them over PCIe just to feed a CPU encoder is wasted bandwidth and latency.

xattt 8 hours ago | parent | prev [-]

It’s a leftover mindset from the mid-2000s when GPGPU became possible, and additional performance was “unlocked” from an otherwise under-utilized silicon.