Remix.run Logo
Workaccount2 8 hours ago

Sadly it still fails the "extra limb" test.

I have a few images of animals with an extra limb photoshopped onto them. A dog with an leg coming out of it's stomach, or a cat with two front right legs.

Like every other model I have tested, it insists that the animals have their anatomically correct amount of limbs. Even pointing out there is a leg coming from the dogs stomach, it will push back and insist I am confused. Insist it counted again and there are definitely only 4. Qwen took it a step further and even after I told it the image was edited, it told me it wasn't and there were only 4 limbs.

Jackson__ 5 hours ago | parent | next [-]

It fails on any edge case, like all other VLMs. The last time a vision model succeeded at reading analog clocks, a notoriously difficult task, it was revealed they trained on nearly 1 million artificial clock images[0] to make it work. In a similar vein, I have encountered no model that could read for example a D20 correctly.[1]

It could probably identify extra limbs in your pictures if you too made a million example images to train it on, but until then it will keep failing. And of course you'll get to keep making millions more example images for every other issue you run into.

[0] https://huggingface.co/datasets/allenai/pixmo-clocks

[1] https://files.catbox.moe/ocbr35.jpg

WithinReason 3 hours ago | parent [-]

I can't tell which number is up either since it's on a white background, am I an LLM?

ComputerGuru 7 hours ago | parent | prev | next [-]

I wonder if you used their image editing feature if it would insist on “correcting” the number of limbs even if you asked for unrelated changes.

brookst 7 hours ago | parent | prev [-]

Definitely not a good model for accurately counting limbs on mutant species, then. Might be good at other things that have greater representation in the training set.