| ▲ | rkorlimarla 11 hours ago | |
this is super interesting - LLM's do not distinguish between pictures/screenshots and text - all are vectorized. LLM's process everything together and is part of the thinking process- it is magic and breakthrough.. My guess is that this was not by design but a nice after-effect of the core attention design.. a lot of papers are written on it - you will find it a very interested read. | ||