▲ | ninetyninenine 15 hours ago | |||||||
Guys you realize that you can go to ChatGPT right now and it can generate an actual picture of the night sky because it has seen thousands of pictures and drawings of the actual night sky right? Your logic is flawed because your knowledge is outdated. LLMs are encoding visual data, not just “language” data. | ||||||||
▲ | heyjamesknight 14 hours ago | parent [-] | |||||||
You misunderstand how the multimodal piece works. The fundamental unit of encoding here is still semantic. Not the same in your mind: you don’t need to know the word for sunset to experience the sunset. | ||||||||
|