Remix.run Logo
MillionOClock a day ago

I also feel like an heavily multimodal model could be very nice for this: allow multiple images from various angles, optionally some true depth data even if imperfect (like what a basic phone LIDAR would output), why not even photos of the same place even if it comes from other sources at other times (just to gather more data), and based on that generate a 3D scene you can explore, using generative AI for filling with plausible content what is missing.