Remix.run Logo
skwb 3 days ago

It's hard to describe, but it's felt like LLMs have completely sucked the entire energy out of computer vision. Like... I know CVPR still happens and there's great research that comes out of it, but almost every single job posting in ML is about LLMs to do this and that to the detriment of computer vision.

jgord 3 days ago | parent | next [-]

yeah, see my other comment.

To me its totally obvious that we will have a plethora of very valuable startups who use RL techniques to solve realworld problems in practical areas of engineering .. and I just get blank stares when I talk about this :]

Ive stopped saying AI when I mean ML or RL .. because people equate LLMs with AI.

We need better ML / RL algos for CV tasks :

  - detecting lines from pixels
  - detecting geometry in pointclouds
  - constructing 3D from stereo images, photogrammetry, 360 panoramas
These might be used by LLMs but are likely built using RL or 'classical' ML techniques, tapping into the vast parallel matmull compute we now have in GPUs / multicore CPUs, and NPUs.
pzo 3 days ago | parent | next [-]

I thought there been a lot of progress in last 2 years. (Video) Depth Anything, SAM2, grounding Dino, DFINE, VLM, Gaussian splats, Nerf. Sure less than progres in LLm but still I would say progress accelerated with LLM research.

tmilard 2 days ago | parent | prev [-]

You said : "- detecting lines from pixels - detecting geometry in pointclouds - constructing 3D from stereo images, photogrammetry, 360 panoramas"

  ==> For me it is more something like :
   Source = crude video-or-photo pixels  (to) ===> Find simple many rectangle-surface  that are glued together one another.
This is, for me, how you really go easily to detecting rather complexes geometry of any room.
jgord 2 days ago | parent [-]

I kind of did a version of what you suggest - I think I linked to a video showing plane edges auto-detected in a pointcloud sample.

Similarly I use another algo to detect pipe runs which tend to appear as half cylinders in the pointcloud, as the scanner usually sees one side, and often the other side is hidden, hard to access, up against a wall.

So, I guess my point is the devil is in the details .. and machine learning can optimize even further on good heuristics we might come up with.

Also, when you go thru a whole pointcloud, you have a lot of data to sift thru, so you want something fairly efficient, even if your using multiple GPUs do do the heavy matmull lifting.

You can think of RL as an optimization - greatly speeding up something like monte carlo tree search, by learning to guess the best solution earlier.

porphyra 3 days ago | parent | prev | next [-]

I feel like 3D reconstruction/bundle adjustment is one of those things where LLMs and new AI stuff haven't managed to get a significant foothold. Recently VGGT won best paper which is good for them, but for the most part, stuff like NERF and Gaussian Splatting still rely on good old COLMAP for bundle adjustment using SIFT features.

Also, LLMs really suck at some basic tasks like counting the sides of a polygon.

KaiserPro 3 days ago | parent [-]

> LLMs really suck at some basic tasks like counting the sides of a polygon.

Oh indeed, but thats not using tokens correctly. if you want to do that, then tokenise the number of polygons....

Barrin92 2 days ago | parent | prev | next [-]

>but almost every single job posting in ML is about LLMs

not in the defense sector, or aviation, or UAVS, automotive, etc. Any proper real-time vision task where you have to computationally interact with visual data is unsuited for LLMs.

Nobody controls a drone, missile or vehicle by taking a screenshot and sending it to ChatGPT and has it do math while it's on flight, anything that requires as the title of the thread says, spatial intelligence is unsuited for a language model

whiplash451 2 days ago | parent | prev | next [-]

It felt the same back in 2012-2015 when deep learning was flooding over computer vision. Yet 10 years later there is a net benefit for computer vision: a lot of tasks are now solved much better/more efficiently with deep learning including those that seemed "unfit" to deep learning like tracking.

I'm hopeful that VLMs will "fan out" into a lot of positive outcomes for computer vision.

SlowTao 2 days ago | parent [-]

That is fair. I think it is a case of just seeing a lot if great talent rush to the "in" thing. Other systems are still being developed and that isnt lost but there is just a feeling if being left out of it all while still doing great stuff.

friendzis 3 days ago | parent | prev | next [-]

What's the equivalent of methadone therapy, but for reckless VC?

What's the equivalent of destroying everything around you while chasing another high, but for reckless VC?

baxtr 3 days ago | parent [-]

Jeopardy?!

glitchc 2 days ago | parent | prev | next [-]

Hah! And I remember when ML itself sucked all the energy out of computer vision. Time to pay the piper.

satyrun 2 days ago | parent | prev | next [-]

Francois Chollet's observation is that LLMs have sucked the air out of the entirety of AI research.

On the other hand I just chatted with Opus 4 for the first time a few minutes ago and I am completely blown away.

m3kw9 2 days ago | parent | prev | next [-]

There is nothing to productize vs LLMs right now. I would say robots could fix that but they have hard problems to solve in the physical sense that will bottle neck things

CSMastermind 2 days ago | parent | prev | next [-]

Everyone is trying to jam transformers into CV workflows at the moment. Possibly productively.

smath 2 days ago | parent | prev | next [-]

agreed about sucking the air out by LLM. The positive side is that its a good time to innovate in other areas while a chunk of ppl are absorbed in LLMs. A proven improvement in any other non LLM space will attract investment.

pixl97 2 days ago | parent [-]

Isn't this what Nvidia is already doing with a lot of their sim software?

3 days ago | parent | prev [-]
[deleted]