| ▲ | ben_w 17 hours ago | |||||||
I've only noticed that combination (failure of short everyday tasks from SOTA models) on image comprehension, not text. So some model will misclassify my American black nightshade* weeds as a tomato, but I get consistently OK results for text out from good models unless it's a trick question. * I recon, at least; looked like this to me: https://en.wikipedia.org/wiki/Solanum_americanum#/media/File... | ||||||||
| ▲ | iLoveOncall 13 hours ago | parent [-] | |||||||
The research from Metr, and my comment, is exclusively related to software development tasks. | ||||||||
| ||||||||