| ▲ | bandrami 3 hours ago | |
If you say the image models don't "see" you also have to say the text models don't "read": there's a meaningful case to be made for either claim but then you're left saying "they behave as if they see" or "they behave as if they read". | ||