Remix.run Logo
WarmWash 3 hours ago

3.1 Pro is the first model to correctly count the number of legs on my "five legged dog" test image. 3.0 flash was the previous best, getting it after a few prompts of poking. 3.1 got it on the first prompt though, with the prompt being "How many legs does the dog have? Count Carefully".

However, it didn't get it on the first try with the original prompt (prompt: "How many legs does the dog have?"). It initially said 4, then with a follow up prompt got it to hesitantly say 5, with one limb must being obfuscated or hidden.

So maybe I'll give it a 90%?

This is without tools as well.

merlindru 3 hours ago | parent [-]

your question may have become part of the training data with how much coverage there was around it. perhaps you should devise a new test :P

iamdelirium 2 hours ago | parent | next [-]

3.1 Pro has the same Jan 2025 knowledge cutoff as the other 3 series models. So if 3.1 has it in its training data, the other ones would have as well.

devsda 2 hours ago | parent | prev | next [-]

I suggest asking it to identify/count the number of fire hydrants, crosswalks, bridges, bicycles, cars, buses and traffic signals etc.

Pit Google against Google :D

gallerdude 3 hours ago | parent | prev | next [-]

My job may have become part of the training data with how much coverage there is around it. Perhaps another career would be a better test of LLM capabilities.

suddenlybananas 3 hours ago | parent [-]

Have you ever heard of a black swan?

WarmWash 3 hours ago | parent | prev | next [-]

Honestly at this point I have fed this image in so many times on so many models, that it also functions as a test for "Are they training on my image specifically" (they are generally, for sure, but that's along with everything else in the ocean of info people dump in).

I genuinely don't think they are. GPT-5.2 still stands by 4 legs, and OAI has been getting this image consistently for over a year. And 3.1 still fumbled with the harder prompt "How many legs does the dog have?". I needed to add the "count carefully" part to tip it off that something was amiss.

Since it did well I'll make some other "extremely far out of the norm" images to see how it fairs. A spider with 10 legs or a fish with two side fins.

wat10000 3 hours ago | parent | prev [-]

Easy fix, make a new test image with six legs, and watch all the LLMs say it has five.