| ▲ | piva00 6 hours ago | |||||||
What? LLMs were designed for text, it's in their name "large language model". Only with specialised encoders like vision transformers they were able to process images as well but you're absolutely wrong about the original design intent. In the end you just added misinformation, just save the comment to your favourites and set a reminder to check it again in a few years like you wanted. | ||||||||
| ▲ | hparadiz 6 hours ago | parent [-] | |||||||
The first technological breakthroughs were with face and red eye detection in 2003. Then object detection between 2008-2012. Text models didn't become useful until about 2016. Please watch the first course of Dr Fei Fei Li's lectures on the subject. | ||||||||
| ||||||||