Remix.run Logo
sippeangelo 5 days ago

I know next to nothing about video encoding, but I feel like there should be so much low hanging fruit when it comes to videogame streaming if the encoder just cooperated with the game engine even slightly. Things like motion prediction would be free since most rendering engines already have a dedicated buffer just for that for its own rendering, for example. But there's probably some nasty patent hampering innovation there, so might as well forget it!

torginus 5 days ago | parent | next [-]

'Motion vectors' in H.264 are a weird bit twiddling/image compression hack and have nothing to do with actual motion vectors.

- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame

- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)

This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.

pornel 4 days ago | parent | next [-]

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.

Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.

However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.

CyberDildonics 4 days ago | parent [-]

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.

pornel 4 days ago | parent [-]

They are used for moving pixels around when used in Frame Generation. P-frames in video codecs aim to do exactly the same thing.

Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.

CyberDildonics 4 days ago | parent [-]

There is a lot to unpack here.

First, neural networks don't have anything to do with this.

Second, generating a new frame would be optical flow and it always is 2D, there is no 3D involved because it's from a 2D image not a 3D scene.

https://en.wikipedia.org/wiki/Optical_flow https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.htm...

Third, optical flow isn't moving blocks of pixels around by an offset then encoding the difference, it is creating a floating point vector for every pixel then re-rasterizing the image into a new one.

pornel 4 days ago | parent [-]

You've previously emphasised use of blocks in video codecs, as if it was some special distinguishing characteristic, but I wanted to explain that's an implementation detail, and novel video codecs could have different approaches to encoding P-frames. They don't have to code a literal 2D vector per macroblock that "moves pixels around". There are already more sophisticated implementations than that. It's an open problem of reusing previous frames' data to predict the next frame (as a base to minimize the residual), and it could be approached in very different ways, including use of neural networks that predict the motion. I mention NNs to emphasise how different motion compensation can be than just copying pixels on a 2D canvas.

Motion vectors are still motion vectors regardless of how many dimensions they have. You can have per-pixel 3D floating-point motion vectors in a game engine, or you can have 2D-flattened motion vectors in a video codec. They're still vectors, and they still represent motion (or its approximation).

Optical flow is just one possible technique of getting the motion vectors for coding P-frames. Usually video codecs are fed only pixels, so they have no choice but to deduce the motion from the pixels. However, motion estimated via optical flow can be ambiguous (flat surfaces) or incorrect (repeating patterns), or non-physical (e.g. fade-out of a gradient). Poorly estimated motion can cause visible distortions when the residual isn't transmitted with high-enough quality to cover it up.

3D motion vectors from a game engine can be projected into 2D to get the exact motion information that can be used for motion compensation/P-frames in video encoding. Games already use it for TAA, so this is going to be pretty accurate and authoritative motion information, and it completely replaces the need to estimate the motion from the 2D pixels. Dense optical flow is a hard problem, and game engines can give the flow field basically for free.

You've misread what I've said about optical flow earlier. You don't need to give me Wikipedia links, I implement codecs for a living.

CyberDildonics 4 days ago | parent [-]

The big difference is that if you are recreating an entire image and there isn't going to be any difference information against a reference image you can't move pixels around, you have to get fractional values out of optical flow and move pixels fractional amounts that potentially overlap in some areas and leave gaps in others.

This means rasterization and making a weighted average of moved pixels as points with a kernel with width and height.

Optical flow isn't one technique, it's just a name for getting motion vectors in the first place.

Here is a lecture to help clear it up.

https://www.cs.princeton.edu/courses/archive/fall19/cos429/s...

pornel 4 days ago | parent [-]

I've started this thread by explaining this very problem, so I don't get why you're trying to lecture me on subpel motion and disocclusion.

What's your point? Your replies seem to be just broadly contrarian and patronizing.

I've continued this discussion assuming that maybe we talk past each other by using the term "motion vectors" in narrower and broader meanings, or maybe you did not believe that the motion vectors that game engines have can be incredibly useful for video encoding.

However, you haven't really communicated your point across. I only see that whenever I describe something in a simplified way, you jump to correct me, while failing to realize that I'm intentionally simplifying for brevity and to avoid unnecessary jargon.

3 days ago | parent | next [-]
[deleted]
CyberDildonics 4 days ago | parent | prev [-]

You said they were the same and then talked about motion vectors from 3D objects and neural networks for an unknown reason.

I'm saying that moving pixels and taking differences to a reference image is different from re-rasterizing an image with distortion and no correction.

robterrell 4 days ago | parent | prev [-]

Isn't the use of the H.264 motion vector to preserve bit when there is a camera pan? A pan is a case where every pixel in the frame will change, but maybe doesn't have to.

superjan 4 days ago | parent [-]

Yes, or when a character moves across the screen. They are quite fine grained. However, when the decoder reads the motion vectors from the bitstream, it is typically not supposed to attach meaning to them: they could point to a patch that is not the same patch in the previous scene, but looks similar enough to serve as a starting point.

ChadNauseam 4 days ago | parent | prev | next [-]

I think you're right. Suppose the connection to the game streaming service adds two frames of latency, and the player is playing an FPS. One thing game engines could do is provide the game UI and the "3D world view" as separate framebuffers. Then, when moving the mouse on the client, the software could translate the 3D world view instantly for the next two frames that came from the server but are from before the user having moved their mouse.

VR games already do something like this, so that when a game runs at below the maximum FPS of the VR headset, it can still respond to your head movements. It's not perfect because there's no parallax and it can't show anything for the region that was previously outside of your field of view, but it still makes a huge difference. (Of course, it's more important for VR because without doing this, any lag spike in a game would instantly induce motion sickness in the player. And if they wanted to, parallax could be faked using a depth map)

rowanG077 4 days ago | parent [-]

You can do parallax if you use the depth buffer.

WantonQuantum 4 days ago | parent | prev | next [-]

A simple thing to start with would be akin to Sensor Assisted Video Encoding where phone accelerometers and digital compasses are used to give hints to video encoding: https://ieeexplore.ieee.org/document/5711656

Also, for 2d games a simple sideways scrolling game could give very accurate motion vectors for the background and large foreground linearly moving objects.

I'm surprised at the number of people disagreeing with your idea here. I think HN has a lot of "if I can't see how it can be done then it can't be done" people.

Edit: Also any 2d graphical overlays like HUDs, maps, scores, subtitles, menus, etc could be sent as 2d compressed data, which could enable better compression for that data - for example much sharper pixel perfect encoding for simple shapes.

derf_ 4 days ago | parent [-]

> I think HN has a lot of "if I can't see how it can be done then it can't be done" people.

No, HN has, "This has been thought of a thousand times before and it's not actually that good of an idea," people.

The motion search in a video encoder is highly optimized. Take your side-scroller as an example. If several of your neighboring blocks have the same MV, that is the first candidate your search is going to check, and if the match is good, you will not check any others. The check itself has specialized CPU instructions to accelerate it. If the bulk of the screen really has the same motion, the entire search will take a tiny fraction of the encoding time, even in a low-latency, real-time scenario. Even if you reduce that to zero, you will barely notice.

On the other end of the spectrum, consider a modern 3D engine. There will be many things not describable by block-based motion of the underlying geometry: shadows, occlusions, reflections, clouds or transparency, shader effects, water or atmospheric effects, etc. Even if you could track the "real" motion through all of that, the best MV to use for compression does not need to match the real motion (which might be very expensive to code, while something "close enough" could be much cheaper, as just one possible reason), it might come from any number of frames (not necessarily the most recent), etc., so you still need to do a search, and it's not obvious the real motion is much better as a starting point than the heuristics an encoder already uses, where they even differ.

All of that said, some encoder APIs do allow providing motion hints [0], you will find research papers and theses on the topic, and of course, patents. That the technique is not more widespread is not because no one ever tried to make it work.

[0] https://docs.nvidia.com/video-technologies/video-codec-sdk/1... as the first random result of a simple search.

WantonQuantum 4 days ago | parent [-]

> If several of your neighboring blocks have the same MV

I think we’re mostly agreeing here. Finding the MVs in any block takes time. Time that can be saved by hints about the direction of motion. Sure, once some motion vectors are found then other blocks benefit by initially assuming similar vectors. To speed things up why not give the hints right away if they’re known a priori?

mikepurvis 4 days ago | parent | prev | next [-]

I’ve wondered about this as well, like most clients should be capable of still doing a bit of compositing. Like if you sent billboard renders of background objects at lower fidelity/frequency than foreground characters, updated hud objects with priority and using codecs that prioritize clarity, etc.

It was always shocking to me that Stadia was literally making their own games in house and somehow the end result was still just a streamed video and the latency gains were supposed to come from edge deployed gpus and a wifi-connected controller.

Then again, maybe they tried some of this stuff and the gains weren't worth it relative to battle-tested video codecs.

toast0 5 days ago | parent | prev | next [-]

For 2d sprite games, OMG yes, you could provide some very accurate motion vectors to the encoder. For 3d rendered games, I'm not so sure. The rendering engine has (or could have) motion vectors for the 3d objects, but you'd have to translate them to the 2d world the encoder works in; I don't know if it's reasonable to do that ... or if it would help the encoder enough to justify.

sudosysgen 5 days ago | parent [-]

Schemes like DLSS already do provide 2D motion vectors, it's not necessarily a crazy ask.

markisus 4 days ago | parent | prev | next [-]

The ultimate compression is to send just the user inputs and reconstitute the game state on the other end.

w-ll 4 days ago | parent | next [-]

The issue is the "reconstitute the game state on the other end" when it comes to at least how I travel.

I haven't in a while but I used to use https://parsec.app/ on a cheap intel Air to do my STO dailies on vacation. It sends inputs, but gets a compressed stream. Im curious of any OS of something similar.

Zardoz84 4 days ago | parent | prev [-]

Good old DooM save demos are essentially this.

cma 5 days ago | parent | prev | next [-]

> Things like motion prediction would be free since most rendering engines already have a dedicated buffer just for that for its own rendering, for example.

Doesn't work for translucency and shader animation. The latter can be made to work if the shader can also calculate motion vectors.

WithinReason 4 days ago | parent | prev | next [-]

Instead of motion vectors you probably want to send RGBD (+depth) so the client can compute its own motion vectors based on input, depth, and camera parameters. You get instant response to user input this way, but you need to in-paint disocclusions somehow.

dmos62 4 days ago | parent | prev | next [-]

Could you say more? My first thought is that CPUs and GPUs have much higher bandwidths and lower latencies than ethernet, so just piping some of that workload to a streaming client wouldn't be feasible. Am I wrong?

5 days ago | parent | prev | next [-]
[deleted]
IshKebab 5 days ago | parent | prev | next [-]

I don't think games do normally have a motion vector buffer. I guess they could render one relatively easily, but that's a bit of a chicken and egg problem.

garaetjjte 5 days ago | parent | next [-]

They do, one reason is postprocessing effects like motion blur, another is antialiasing like TAA or DLSS upscaling.

shmerl 4 days ago | parent | next [-]

Many games have it, but I always turn it off. I guess some like its cinematic effect, but I prefer less motion blur, not more.

theshackleford 4 days ago | parent [-]

Modern monitor technology has more than enough technology that adding more is most certainly not my cup of tea. Made worse ironically by modern rendering techniques...

Though my understanding is that it helps hide shakier framerates in console land. Which sounds like it could be a thing...

shmerl 4 days ago | parent [-]

If anything, high refresh rate displays are trying to reduce motion blur. Artificially adding it back sounds weird and counter intuitive.

tjoff 4 days ago | parent [-]

It adds realism.

Your vision have motion blur. Staring at your screen at fixed distance and no movement is highly unrealistic and allows you to see crisp 4k images no matter the content. This results in a cartoonish experience because it mimics nothing in real life.

Now you do have the normal problem that the designers of the game/movie can't know for sure what part of the image you are focusing on (my pet peeve with 3D movies) since that affects where and how you would perceive the blur.

Also have the problem of overuse or using it to mask other issues, or just as an artistic choice.

But it makes total sense to invest in a high refresh display with quick pixel transitions to reduce blur, and then selectively add motion blur back artificially.

Turning it off is akin to cranking up the brightness to 400% because otherwise you can't make out details in the dark parts off the game ... thats the point.

But if you prefer it off then go ahead, games are meant to be enjoyed!

oasisaimlessly 4 days ago | parent [-]

Your eyes do not have built-in motion blur. If they are accurately tracking a moving object, it will not be seen as blurry. Artifically adding motion blur breaks this.

tjoff 3 days ago | parent [-]

Sure they do, the moving object in focus will not have motion blur but the surroundings will. Motion blur is not indiscriminately adding blur everywhere.

theshackleford 3 days ago | parent [-]

> Motion blur is not indiscriminately adding blur everywhere.

Motion blur in games is inaccurate and exaggerated and isn’t close to presenting any kind of “realism.”

My surroundings might have blur, but I don’t move my vision in the same way a 3d camera is controlled in game, so in the “same” circumstances I do not see the blur you do when moving a camera in 3d space in a game. My eyes jump from point to point, meaning the image I see is clear and blur free. When I’m tracking a single point, that point remains perfectly clear whilst sure, outside of that the surroundings blur.

However motion blur in games does can literally not replicate either of these realities, it just adds a smear on top of a smear on top of a smear.

So given both are unrealistic, I’d appreciate the one that’s far closer to how I actually see which is the one without yet another layer of blur. Modern displays add blur, modern rendering techniques add more, I don't need EVEN more added on top with in-game blur on top of that.

tjoff 2 days ago | parent [-]

Yes, and that was exactly my point in my original post...

With or without, neither is going to be perfect. At least when not even attempting eye-tracking. But there are still many reasons to do it.

IshKebab 5 days ago | parent | prev [-]

Yeah I did almost mention motion blur but do many games use that? I don't play many AAA games TBF so maybe I'm just out of date...

Take something like Rocket League for example. Definitely doesn't have velocity buffers.

raincole 5 days ago | parent | next [-]

> Take something like Rocket League for example. Definitely doesn't have velocity buffers.

How did you reach this conclusion? Rocket League looks like a game that definitely have velocity buffers to me. (Many fast-moving scenarios + motion blur)

IshKebab 4 days ago | parent [-]

It doesn't have motion blur. At least I've never seen any.

Actually I just checked and it does have a motion blur setting... maybe I just turned it off years ago and forgot or something.

izacus 5 days ago | parent | prev [-]

Yes, most games these days have motion blur and motion vector buffers.

Yes even Rocket League has it

ACCount36 5 days ago | parent | prev [-]

Exposing motion vectors is a prerequisite for a lot of AI framegen tech. If you could tap that?

tomaskafka 4 days ago | parent | prev | next [-]

Also all major GPUs now have machine learning based next frame prediction, it’s hard to imagine this wouldn’t be useful.

keyringlight 4 days ago | parent [-]

Plus whether there's further benefits available for the FSR/DLSS/XeSS type upscalers in knowing more about the scene. I'm reminded a bit of variable rate shading where if renderer analyses the scene for where detail levels will reward spending performance, could assign blocks (eg, 1x2, 4x2 pixels etc) to be shaded once instead of per-pixel to concentrate there. It's not exactly the same thing as the upscalers, but it seems a better foundation for a better output image compared to a blunt dropping the whole rendered resolution by a percentage. However, that's assuming traditional rendering before any ML gets involved which I think has proven its case in the past 7 years.

I think the other side to this is the difference between further integration of the engine and scaler/frame generation which would seem to involve a lot of low level tuning (probably per-title), and having a generic solution that uplifts as many titles as possible even if there's "perfect is the enemy of good" left on the table.

d--b 4 days ago | parent | prev [-]

The point of streaming games though is to offload the hard computation to the server.

I mean you could also ship the textures ahead of time so that the compressor could look up if something looks like a distorted texture. You could send the geometry of what's being rendered, that would give a lot of info to the decompressor. You could send the HUD separately. And so on.

But here you want something that's high level and works with any game engine, any hardware. The main issue being latency rather than bandwidth, you really don't want to add calculation cycles.