recent paper on “ How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks” [1]
[1] https://arxiv.org/abs/2507.01955