|
| ▲ | IgorPartola a day ago | parent | next [-] |
| If you are doing something like a GIF or an MJPEG, sure. If you are doing forwards and backwards keyframes with a variable amount of deltas in between, with motion estimation, with grain generation, you start having a very dynamic amount of state. Granted, encoders are more complex than decoders in some of this. But still you might need to decode between 1 and N frames to get the frame you want, and you don't know how much memory it will consume once it is decoded unless you decode it into bitmaps (at 4k that would be over 8MB per frame which very quickly runs out of memory for you if you want any sort of frame buffer present). I suspect the future of video compression will also include frame generation, like what is currently being done for video games. Essentially you have let's say 12 fps video but your video card can fill in the intermediate frames via what is basically generative AI so you get 120 fps output with smooth motion. I imagine that will never be something that WUFFS is best suited for. |
| |
| ▲ | derf_ a day ago | parent | next [-] | | > But still you might need to decode between 1 and N frames to get the frame you want, and you don't know how much memory it will consume... All of these things are bounded for actual codecs. AV1 allows storing at most 8 reference frames. The sequence header will specify a maximum allowable resolution for any frame. The number of motion vectors is fixed once you know the resolution. Film grain requires only a single additional buffer. There are "levels" specified which ensure interoperability at common operating points (e.g., 4k) without even relying on the sequence header (you just reject sequences that fall outside the limits). Those are mostly intended for hardware, but there is no reason a software decoder could not take advantage of them. As long as codecs are designed to be implemented in hardware, this will be possible. | |
| ▲ | GuB-42 a day ago | parent | prev | next [-] | | > I suspect the future of video compression will also include frame generation That's how most video codecs work already. They try to "guess" what the next frame will be, based on past (for P-frames) and future (for B-frames) frames. The difference is that the codec encodes some metadata to help with the process and also the difference between the predicted frame and the real frame. As for using AI techniques to improve prediction, it is not a new thing at all. Many algorithms optimized for compression ratio use neural nets, but these tend to be too computationally expensive for general use. In fact the Hutter prize considers text compression as an AI/AGI problem. | |
| ▲ | lubesGordi a day ago | parent | prev [-] | | See this is interesting to me. I understand the desire to dynamically allocate buffers at runtime to capture variable size deltas. That's cool, but also still maybe technically unnecessary? Because like you say, at 4k and over 8MB per frame; you still can't allocate over a limit. So likely a codec would have some boundary set on that anyway. Why not just pre-allocate at compile time? For sure this results in a complex data structure. Functionally it could be the same and we would elide the cost of dynamic memory allocations. What I'm suggesting is probably complex, I'm sure. In any case I get what you're saying and I understand why codecs are going to be dynamically allocating memory, so thanks for that. |
|
|
| ▲ | zimpenfish a day ago | parent | prev | next [-] |
| > codecs are basically designed to minimize the amount of 'work' being done from frame to frame But to do that they have to keep state and do computations on that state. If you've got frame 47 being a P frame, that means you need frame 46 to decode it correctly. Or frame 47 might be a B frame in which case you need frame 46 and possibly also frame 48 - which means you're having to unpack frames "ahead" of yourself and then keep them around for the next decode. I think that all counts as "dynamic state"? |
| |
| ▲ | wtallis a day ago | parent [-] | | Memory usage can vary, but video codecs are designed to make it practical to derive bounds on those memory requirements because hardware implementations don't have the freedom to dynamically allocate more silicon. | | |
|
|
| ▲ | dylan604 a day ago | parent | prev | next [-] |
| Maybe you're not familiar with how long GOP encoding works with IPB frames? If all frames were I-frames, maybe what you're thinking might work. Everything you need is in the one frame to be able to describe every single pixel in that frame. Once you start using P-frames, you have to hold on to data from the I-frame to decode the P-frame. With B-frames, you might need data from frames not yet decoded as the are bi-direction references. |
| |
| ▲ | lubesGordi a day ago | parent [-] | | Still you don't necessarily need to have dynamic memory allocations if the number of deltas you have is bounded. In some codecs I could definitely see those having a varying size depending on the amount of change going on in the scene. I'm not a codec developer, I'm only coming at this from an outside/intuitive perspective. Generally, performance concerned parties want to minimize heap allocations, so I'm interested in this as how it applies in codec architecture. Codecs seem so complex to me, with so much inscrutable shit going on, but then heap allocations aren't optimized out? Seems like there has to be a very good reason for this. | | |
| ▲ | izacus a day ago | parent | next [-] | | You're actually right about allocation - most video codecs are written with hardware decoders in mind which have fixed memory size. This is why their profiles hard limit the memory constraints needed for decode - resolution, number of reference frames, etc. That's not quite the case for encoding - that's where things get murky since you have way more freedom at what you can do to compress better. | |
| ▲ | Sesse__ a day ago | parent | prev [-] | | The very good reason is that there's simply not a lot of heap allocations going on. It's easy to check; run perf against e.g. ffmpeg decoding a big file to /dev/null, and observe the distinct lack of malloc high up in the profile. There's a heck of a lot of distance from “not a lot” to “zero”, though. |
|
|
|
| ▲ | throwawaymaths a day ago | parent | prev | next [-] |
| compression algorithms can get very clever in recursive ways |
|
| ▲ | lubesGordi a day ago | parent | prev [-] |
| Hey maybe we can discuss why I'm being downvoted? This is a technical discussion and I'm contributing. If you disagree then say why. I'm not stating anything as fact that isn't fact. I am getting downvoted for asking a question. |