Remix.run Logo
rkorlimarla 11 hours ago

this is super interesting - LLM's do not distinguish between pictures/screenshots and text - all are vectorized. LLM's process everything together and is part of the thinking process- it is magic and breakthrough.. My guess is that this was not by design but a nice after-effect of the core attention design.. a lot of papers are written on it - you will find it a very interested read.