Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.

▲

pornel 4 days ago | parent [-]

They are used for moving pixels around when used in Frame Generation. P-frames in video codecs aim to do exactly the same thing.

Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.

▲

CyberDildonics 4 days ago | parent [-]

There is a lot to unpack here.

First, neural networks don't have anything to do with this.

Second, generating a new frame would be optical flow and it always is 2D, there is no 3D involved because it's from a 2D image not a 3D scene.

https://en.wikipedia.org/wiki/Optical_flow https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.htm...

Third, optical flow isn't moving blocks of pixels around by an offset then encoding the difference, it is creating a floating point vector for every pixel then re-rasterizing the image into a new one.

▲

pornel 4 days ago | parent [-]

You've previously emphasised use of blocks in video codecs, as if it was some special distinguishing characteristic, but I wanted to explain that's an implementation detail, and novel video codecs could have different approaches to encoding P-frames. They don't have to code a literal 2D vector per macroblock that "moves pixels around". There are already more sophisticated implementations than that. It's an open problem of reusing previous frames' data to predict the next frame (as a base to minimize the residual), and it could be approached in very different ways, including use of neural networks that predict the motion. I mention NNs to emphasise how different motion compensation can be than just copying pixels on a 2D canvas.

Motion vectors are still motion vectors regardless of how many dimensions they have. You can have per-pixel 3D floating-point motion vectors in a game engine, or you can have 2D-flattened motion vectors in a video codec. They're still vectors, and they still represent motion (or its approximation).

Optical flow is just one possible technique of getting the motion vectors for coding P-frames. Usually video codecs are fed only pixels, so they have no choice but to deduce the motion from the pixels. However, motion estimated via optical flow can be ambiguous (flat surfaces) or incorrect (repeating patterns), or non-physical (e.g. fade-out of a gradient). Poorly estimated motion can cause visible distortions when the residual isn't transmitted with high-enough quality to cover it up.

3D motion vectors from a game engine can be projected into 2D to get the exact motion information that can be used for motion compensation/P-frames in video encoding. Games already use it for TAA, so this is going to be pretty accurate and authoritative motion information, and it completely replaces the need to estimate the motion from the 2D pixels. Dense optical flow is a hard problem, and game engines can give the flow field basically for free.

You've misread what I've said about optical flow earlier. You don't need to give me Wikipedia links, I implement codecs for a living.

▲

CyberDildonics 4 days ago | parent [-]

The big difference is that if you are recreating an entire image and there isn't going to be any difference information against a reference image you can't move pixels around, you have to get fractional values out of optical flow and move pixels fractional amounts that potentially overlap in some areas and leave gaps in others.

This means rasterization and making a weighted average of moved pixels as points with a kernel with width and height.

Optical flow isn't one technique, it's just a name for getting motion vectors in the first place.

Here is a lecture to help clear it up.

https://www.cs.princeton.edu/courses/archive/fall19/cos429/s...

▲

pornel 4 days ago | parent [-]

I've started this thread by explaining this very problem, so I don't get why you're trying to lecture me on subpel motion and disocclusion.

What's your point? Your replies seem to be just broadly contrarian and patronizing.

I've continued this discussion assuming that maybe we talk past each other by using the term "motion vectors" in narrower and broader meanings, or maybe you did not believe that the motion vectors that game engines have can be incredibly useful for video encoding.

However, you haven't really communicated your point across. I only see that whenever I describe something in a simplified way, you jump to correct me, while failing to realize that I'm intentionally simplifying for brevity and to avoid unnecessary jargon.

	▲	3 days ago \| parent \| next [-]
		[deleted]
	▲	CyberDildonics 4 days ago \| parent \| prev [-]
		You said they were the same and then talked about motion vectors from 3D objects and neural networks for an unknown reason. I'm saying that moving pixels and taking differences to a reference image is different from re-rasterizing an image with distortion and no correction.