Remix.run Logo
supermatt 15 hours ago

I note the lack of human portraits in the example cases.

My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.

I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.

Granted they do call it "Monocular View Synthesis", but i'm unclear as to what its accuracy or real-world use would be if you cant combine 2 views to form a convincing stereo pair.

sorenjan 15 hours ago | parent [-]

They're using their Depth Pro model for depth estimation, and that seems to do faces really well.

https://github.com/apple/ml-depth-pro

https://learnopencv.com/depth-pro-monocular-metric-depth/

supermatt 15 hours ago | parent [-]

Im not sure how the depth estimation alone translates into the view synthesis, but the current implementation on-device is definitely not convincing for literally any portrait photographs I have seen.

True stereoscopic captures are convincing statically, but don't provide the parallax.

sorenjan 12 hours ago | parent [-]

Good monocular depth estimation is crucial if you want to make a 3D representation from a single image. Ordinarily you have images from several camera poses and can create the gaussian splats using triangulation, with a single image you have to guess z position for them.

Someone 6 hours ago | parent [-]

For selfies, I think iPhones with Face ID use the TrueDepth camera hardware to measure Z position. That’s not full camera resolution, but it will definitely help.