Remix.run Logo
torginus 5 days ago

'Motion vectors' in H.264 are a weird bit twiddling/image compression hack and have nothing to do with actual motion vectors.

- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame

- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)

This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.

pornel 4 days ago | parent | next [-]

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.

Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.

However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.

CyberDildonics 4 days ago | parent [-]

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.

pornel 4 days ago | parent [-]

They are used for moving pixels around when used in Frame Generation. P-frames in video codecs aim to do exactly the same thing.

Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.

CyberDildonics 4 days ago | parent [-]

There is a lot to unpack here.

First, neural networks don't have anything to do with this.

Second, generating a new frame would be optical flow and it always is 2D, there is no 3D involved because it's from a 2D image not a 3D scene.

https://en.wikipedia.org/wiki/Optical_flow https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.htm...

Third, optical flow isn't moving blocks of pixels around by an offset then encoding the difference, it is creating a floating point vector for every pixel then re-rasterizing the image into a new one.

pornel 4 days ago | parent [-]

You've previously emphasised use of blocks in video codecs, as if it was some special distinguishing characteristic, but I wanted to explain that's an implementation detail, and novel video codecs could have different approaches to encoding P-frames. They don't have to code a literal 2D vector per macroblock that "moves pixels around". There are already more sophisticated implementations than that. It's an open problem of reusing previous frames' data to predict the next frame (as a base to minimize the residual), and it could be approached in very different ways, including use of neural networks that predict the motion. I mention NNs to emphasise how different motion compensation can be than just copying pixels on a 2D canvas.

Motion vectors are still motion vectors regardless of how many dimensions they have. You can have per-pixel 3D floating-point motion vectors in a game engine, or you can have 2D-flattened motion vectors in a video codec. They're still vectors, and they still represent motion (or its approximation).

Optical flow is just one possible technique of getting the motion vectors for coding P-frames. Usually video codecs are fed only pixels, so they have no choice but to deduce the motion from the pixels. However, motion estimated via optical flow can be ambiguous (flat surfaces) or incorrect (repeating patterns), or non-physical (e.g. fade-out of a gradient). Poorly estimated motion can cause visible distortions when the residual isn't transmitted with high-enough quality to cover it up.

3D motion vectors from a game engine can be projected into 2D to get the exact motion information that can be used for motion compensation/P-frames in video encoding. Games already use it for TAA, so this is going to be pretty accurate and authoritative motion information, and it completely replaces the need to estimate the motion from the 2D pixels. Dense optical flow is a hard problem, and game engines can give the flow field basically for free.

You've misread what I've said about optical flow earlier. You don't need to give me Wikipedia links, I implement codecs for a living.

CyberDildonics 4 days ago | parent [-]

The big difference is that if you are recreating an entire image and there isn't going to be any difference information against a reference image you can't move pixels around, you have to get fractional values out of optical flow and move pixels fractional amounts that potentially overlap in some areas and leave gaps in others.

This means rasterization and making a weighted average of moved pixels as points with a kernel with width and height.

Optical flow isn't one technique, it's just a name for getting motion vectors in the first place.

Here is a lecture to help clear it up.

https://www.cs.princeton.edu/courses/archive/fall19/cos429/s...

pornel 4 days ago | parent [-]

I've started this thread by explaining this very problem, so I don't get why you're trying to lecture me on subpel motion and disocclusion.

What's your point? Your replies seem to be just broadly contrarian and patronizing.

I've continued this discussion assuming that maybe we talk past each other by using the term "motion vectors" in narrower and broader meanings, or maybe you did not believe that the motion vectors that game engines have can be incredibly useful for video encoding.

However, you haven't really communicated your point across. I only see that whenever I describe something in a simplified way, you jump to correct me, while failing to realize that I'm intentionally simplifying for brevity and to avoid unnecessary jargon.

3 days ago | parent | next [-]
[deleted]
CyberDildonics 4 days ago | parent | prev [-]

You said they were the same and then talked about motion vectors from 3D objects and neural networks for an unknown reason.

I'm saying that moving pixels and taking differences to a reference image is different from re-rasterizing an image with distortion and no correction.

robterrell 4 days ago | parent | prev [-]

Isn't the use of the H.264 motion vector to preserve bit when there is a camera pan? A pan is a case where every pixel in the frame will change, but maybe doesn't have to.

superjan 4 days ago | parent [-]

Yes, or when a character moves across the screen. They are quite fine grained. However, when the decoder reads the motion vectors from the bitstream, it is typically not supposed to attach meaning to them: they could point to a patch that is not the same patch in the previous scene, but looks similar enough to serve as a starting point.