I personally don't understand why asking these things to do things we know they can't do is supposed to be productive. Maybe for getting around restrictions or fuzzing... I don't see it as an effective benchmark unless it can link directly to the ways the models are being improved, but, to look at random results that sometimes are valid and think more iterations of randomness will eventually give way to control is a maddening perspective to me, but perhaps I need better language to describe this.

▲

thecr0w a day ago | parent [-]

I think this is a reasonable take. I think for me, I like to investigate limitations like this in order to understand where the boundaries are. Claude isn't impossibly bad at analyzing images. It's just pixel perfect corrections that seem to be a limitation. Maybe for some folks it's enough to just read that but for me, I like to feel like I have some good experiential knowledge about the limitations that I can keep in my brain and apply appropriately in the future.

	▲	th0ma5 6 hours ago \| parent [-]
		Yes but follow this forward, what about current models would be informative about future models. We've seen waves of "insight" come and go, to the point where there are endless waves of people at different points in the journey, there's a cohort of people that would be upset at the statement that prompt engineering is useless, and others that would support that as exactly right, and still more that would have a redefinition of the word prompt to include many other things. This is my exact complaint. You would want it work like how you want it to work, that our collective discoveries will turn into education and learning, but the content in the models and the subsequent inference based on that information have all not behaved like the physical sciences with regards to discoveries providing universal and reliable knowledge.