▲ | chaitimes 20 hours ago | |
I would assume there's enough training data now to extrapolate from the visuals the answers to these basic tests. Why do they fail miserably on such trivial questions while appearing to perform very well on complicated tests like 3d object generation | ||
▲ | yorwba 18 hours ago | parent [-] | |
There are unlikely to be many six-fingered hands in the training data. So there's little reason for the model to develop the ability to recognize one when it encounters it. Maybe the result improves if you break the task down into two steps of listing the bounding boxes of all fingers in the image and then counting the number of bounding boxes. |