| ▲ | mattnewton 21 hours ago | |
I’m saying there is basically no way to both make vlms able to understand the long tail of PDFs where the layout conveys information (like charts and tables) and to make it as token efficient as text formats. Current approaches have mostly chosen to work more often than not at the cost of token efficiency. | ||