Remix.run Logo
Lerc a day ago

What models are good at this? I have tried passing images to models and asking them for coordinates for specific features, then overlaid dots on those points and passed that image back to the model so it has a perception of how far out it was. It had a tendency to be consistently off by a fixed amount without getting closer.

I don't doubt that it is possible eventually, but I haven't had much luck.

Something that seemed to assist was drawing a multi coloured transparent chequerboard, if the AI knows the position of the grid colours it can pick out some relative information from the grid.

daemonologist a day ago | parent | next [-]

I've found Qwen3-VL to be fairly accurate at detection (though it doesn't always catch every instance). Note that it gives answers as per-mille-ages, as if the image was 1000x1000 regardless of actual resolution or aspect ratio.

I have also not had luck with any kind of iterative/guess-and-check approach. I assume the models are all trained to one-shot this kind of thing and struggle to generalize to what are effectively relative measurements.

ryoshu 20 hours ago | parent | prev [-]

I can't do that either without opening up an image editing tool. Give the model a tool and goal with "vision". Should work better.