Remix.run Logo
jgord 3 days ago

makes sense - humans have evolved a lot of wetware dedicated to 3D processing from stereo 2D.

I've made some progress on a PoC in 3D reconstruction - detecting planes, edges, pipes from pointclouds from lidar scans, eg : https://youtu.be/-o58qe8egS4 .. and am bootstrapping with in-house gigs as I build out the product.

Essentially it breaks down to a ton of matmulls, and I use a lot of tricks from pre-LLM ML .. this is a domain that perfectly fits RL.

The investors Ive talked to seem to understand that scan-to-cad is a real problem with a viable market - automating 5Bn / yr of manual click-labor. But they want to see traction in the form of early sales of the MVP, which is understandable, especially in the current regime of high interest rates.

Ive not been able to get across to potential investors the vast implications for robotics, AI, AR, VR, VFX that having better / faster / realtime 3D reconstruction will bring. Its great that someone of the caliber of Fei-Fei Li is talking about it.

Robots that interact in the real world will need to make a 3D model in realtime and likely share it efficiently with comrades.

While a gaussian splat model is more efficient than a pointcloud, a model which recognizes a wall as a quad plane is much more efficient still, and needed for realtime communication. There is the old idea that compression is equivalent to AI.

What is stopping us from having a google street-view v3.0 in which I can zoom right into and walk around a shopping mall, or train station or public building ? Our browsers can do this now, essentially rendering quake like 3D environments - the problem is with turning a scan into a lightweight 3D model.

Photogrammetry, where you have hundreds of photos and reconstruct the 3D scene, uses a lot of compute, and the colmap / Structure-from-Motion algorithm predates newer ML approaches and is ripe for a better RL algorithm imo. Ive done experiments where you can manually model a 3D scene from well positioned 360 panorama photos of a building, picking corners, following the outline of walls to make a floorplan etc ... this should be amenable to an RL algorithm. Most 360 panorama photo tours have enough overlap to reconstruct the scene reasonably well.

I have no doubt that we are on the brink of a massive improvement in 3D processing. Its clearly solvable with the ML/RL approaches we currently have .. we dont need AGI. My problem is getting funding to work on it fulltime, equivalently talking an investor into taking that bet :)

MITSardine 2 days ago | parent | next [-]

Have you tried "traditional" approaches like a Delaunay triangulation on the point cloud, and how does your method compare to that? Or did you encounter difficulties with that?

Regarding what you say of planes and compression, you can look into metric-based surface remeshing. Essentially, you estimate surface curvature (second derivatives) and use that to distort length computations, remeshing your surface to length one in that distorted space, which then yields optimal DoFs to surface approximation error. A plane (or straight line) has 0 curvature so lengths are infinite along it (hence final DoFs there minimal). There's software to do that already, thought I'm not sure it's robust to your usecase, because they've been developed for scientific computing with meshes generated from CAD (presumably smoother than your point cloud meshes).

I'd be really curious to know more about the type of workflow you're interested in, i.e. what does your input look like (do you use some open data sets as well?) and what you hope for in the end (mesh, CAD).

jgord 2 days ago | parent [-]

short answer yes .. I tried a _lot_ of approaches, many worked partially. I think I linked to a YT video screencast showing edges of planes that my algo had detected in a sample pointcloud ?

Efficient re-meshings are important, and its worth improving on the current algorithms to get crisper breaklines etc, but you really want to go a step further and do what humans do manually now when they make a CAD model from a pointcloud - ie. convert it to its most efficient / compressed / simple useful format, where a wall face is recognized as a simple plane. Even remeshing and flat triangle tesselation can be improved a lot by ML techniques.

As with pointclouds, likewise with 'photogrammetry', where you reconstruct a 3D scene from hundreds of photos, or from 360 panoramas or stereo photos. I think in the next 18 months ML will be able to reconstruct an efficient 3D model from a streetview scene, or 360 panorama tour of a building. An optimized mesh is good for visualization in a web browser, but its even more useful to have a CAD style model where walls are flat quads, edges are sharp and a door is tagged as a door etc.

Perhaps the points Im trying to make are :

  - the normal techniques are useful but not quite enough [ heuristics, classical CV algorithms, colmap/SfM ] 
  - NeRFs and gaussian splats are amazing innovations, but dont quite get us there
  - to solve 3D reconstruction, from pointclouds or photos, we need ML to go beyond our normal heuristics : 3D reality is complicated
  - ML, particularly RL, will likely solve 3D reconstruction quite soon, for useful things like buildings
  - this will unlock a lot of value across many domains - AEC / construction, robotics, VR / AR
  - there is low hanging fruit, such as my algo detecting planes and pipes in a pointcloud
  - given the progress and the promise, we should be seeing more investment in this area [ 2Mn of investment could potentially unlock 10Bn/yr in value ]
MITSardine a day ago | parent [-]

I'm not sure I'm sold on the necessity to detect lines and planes specifically. My issue is, suppose you could do that perfectly, then what of the rest of the geometry? If you're aiming for a CAD model in the end (BREP), you'll want to fit the whole thing, not only the planes and lines. And it seems to me an approach specialized for lines and planes is helpless at fitting general surfaces and curves. In my mind, a general approach that incidentally also finds straight lines and planes would be better (necessary).

Note if you can fit a BREP, it's fairly trivial to find whether a curve is close enough to a straight line that you can just stipulate it's a straight line (same for a plane).

Have you looked into NURBS fitting through point clouds? I understand those can be noisy and over sampled. A colleague got away with sorting point clouds by a Hilbert curve (or other space filling curve) and then keeping 1/N points (just by index), a simple but elegant way to remove N-1 every N points while keeping the general distribution mostly intact (you could also use an octree). Though I recall in some cases the distribution of points was not uniformly too dense, but e.g. dense along scanning lines and sparse between those lines.

Once it's tractable to triangulate the point cloud, you have two important pieces of information at your disposal: local connectivity (prior to that, it'd have been nlog(n) at best to find nearby points) and a notion of topology after some basic processing (e.g. detecting ridges to make out surface patches). With the former, you could do things like smooth the surface to do away with noisiness (say your points are randomly a small distance away from the plane, but any shape really) for better NURBS fitting, estimating normals, etc. and with the latter you could split the domain into faces and curves for your BREP.

At the very least, you'd get much cleaner data to feed an ML algo than basic point clouds. I just find it strange to tackle the raw data head on when there's so many methods for dealing with geometry already, at the very least to clean things up and make some sense of the data.

Have you looked at existing products that do point cloud -> CAD? What are they lacking in?

KaiserPro 2 days ago | parent | prev | next [-]

I have worked around spatial AI for a number of years.

Most of the stuff I have been working with has been aimed at low power consumption. One of the things that really helped is not bothering with dense reconstruction at all.

things like scenescript and SpaRP where instead of trying to capture all the geometry (like photogrammetry) the essential dimensions are captured and either outputted to a text description (scene script) or a simple model with decent normals (SpaRP)

Humans don't really keep complex dense reconstructions in our head. Its all about spatial relationships of landmarks.

rallyforthesun 2 days ago | parent | prev | next [-]

SplatAM is an interesting new way to generate 3D Gaussians in real-time. It relies on RGB+D data and doesn’t need COLMAP at all. I am not related to it but am using it for a project with a robot, as its main purpose is to do SLAM. As far as I understand, it uses the point cloud for the alignment of the images

edit:typo

jgord 3 days ago | parent | prev | next [-]

ps. its handy to compare the relative data sizes of [ models of ] the same scene : typically for something like a house, the data will be ballpark :

  -  15GB of pointcloud data ( 100Mn xyzRGB points from a lidar laser scanner )
  -  3 GB of 360 panorama photos
  -  50MB obj 3D textured model
  -  2MB CAD model
Im guessing gaussian-splat would be something like 20x to 40x more efficient than the pointcloud. I achieved similar compression for building scans, using flat textured mini-planes.
marsven_422 3 days ago | parent | prev [-]

[dead]