Remix.run Logo
Leptonmaniac 20 hours ago

Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.

emsign 20 hours ago | parent | next [-]

Imagine history documentaries where they take an old photo and free objects from the background and move them round giving the illusion of parallax movement. This software does that in less than a second, creating a 3D model that can be accurately moved (or the camera for that matter) in your video editor. It's not new, but this one is fast and "sharp".

Gaussian splashing is pretty awesome.

crazygringo 8 hours ago | parent | next [-]

Oh man. I never thought about how Ken Burns might use that.

Already you sometimes see where manually cut out a foreground person from the background and enlarge them a little bit and create a multi-layer 3D effect, but it's super-primitive and I find it gimmicky.

Bringing actual 3D to old photographs as the camera slowly pans or rotates slightly feels like it could be done really tastefully and well.

kurtis_reed 19 hours ago | parent | prev [-]

What are free objects?

ferriswil 19 hours ago | parent [-]

The "free" in this case is a verb. The objects are freed from the background.

Retr0id 19 hours ago | parent | next [-]

Until your comment I didn't realise I'd also read it wrong (despite getting the gist of it). Attempted rephrase of the original sentence:

Imagine history documentaries where they take an old photo, free objects from the background, and then move them round to give the illusion of parallax.

necovek 18 hours ago | parent | next [-]

I'd suggest a different verb like "detach" or "unlink".

thenthenthen 15 hours ago | parent [-]

isolate from the background?

necovek 8 hours ago | parent [-]

Even better, agreed!

nashashmi 15 hours ago | parent | prev | next [-]

Free objects in the background.

Sharlin 14 hours ago | parent [-]

No, free objects in the foreground, from the background.

tzot 18 hours ago | parent | prev [-]

> Imagine history documentaries where they take an old photo, free objects from the background

Even using commas, if you leave the ambiguous “free” I suggest you prefix “objects” with “the” or “any”.

19 hours ago | parent | prev [-]
[deleted]
ares623 20 hours ago | parent | prev | next [-]

Takes a 2D image and allows you to simulate moving the angle of the camera with correct-ish parallax effect and proper subject isolation (seems to be able to handle multiple subjects in the same scene as well)

I guess this is what they use for the portrait mode effects.

derleyici 20 hours ago | parent | prev | next [-]

It turns a single photo into a rough 3D scene so you can slightly move the camera and see new, realistic views. "Photorealistic" means it preserves real textures and lighting instead of a flat depth effect. Similar behavior can be seen with Apple's Spatial Scene feature in the Photos app: https://files.catbox.moe/93w7rw.mov

eloisius 20 hours ago | parent | prev | next [-]

From a single picture it infers a hidden 3D representation, from which you can produce photorealistic images from slightly different vantage points (novel views).

avaer 20 hours ago | parent [-]

There's nothing "hidden" about the 3d represenation. It's a point cloud (in meters) with colors, and a guess at the the "camera" that produced it.

(I am oversimplifying).

uh_uh 19 hours ago | parent | next [-]

"Hidden" or "latent" in a context like this just means variables that the algo is trying to infer because it doesn't have direct access to them.

eloisius 20 hours ago | parent | prev [-]

Hidden in the sense of neural net layers. I mean intermediary representation.

avaer 20 hours ago | parent [-]

Right.

I just want to emphasize that this is not a NERF where the model magically produces an image from an angle and then you ask "ok but how did you get this?" and it throws up its hands and says "I dunno, I ran some math and I got this image" :D.

zipy124 15 hours ago | parent | prev | next [-]

Basically depth estimation to split the scene into various planes, and then inpainting to work out the areas in the obscured parts of the planes, and then the free movement of them to allow for parallax. Think of 2D side scrolling games that have various different background depths to give illusion of motion and depth.

skygazer 11 hours ago | parent | prev | next [-]

Apple does something similar right now in their photos app, generating spatial views from 2d photos, where parallax is visible by moving your phone. This paper’s technique seems to produce them faster. They also use this same tech in their Vision Pro headset to generate unique views per eye, likewise on spatialized images from Photos.

avaer 20 hours ago | parent | prev | next [-]

It makes your picture 3D. The "photorealistic" part is "it's better than these other ways".

carabiner 19 hours ago | parent | prev | next [-]

Black Mirror episode portraying what this could do: https://youtu.be/XJIq_Dy--VA?t=14. If Apple ran SHARP on this photo and compared it to the show, that would be incredible.

Or if you prefer Blade Runner: https://youtu.be/qHepKd38pr0?t=107

diimdeep 14 hours ago | parent [-]

One more example from Star Trek Into Darkness https://youtu.be/p7Y4nXTANRQ?t=61

p-e-w 20 hours ago | parent | prev [-]

Agreed, this is a terrible presentation. The paper abstract is bordering on word salad, the demo images are meaningless and don’t show any clear difference to the previous SotA, the introduction talks about “nearby” views while the images appear to show zooming in, etc.