| ▲ | stevenhuang 5 hours ago | |
This has been known since VLMs were a thing, that more information can be encoded visually and token efficiency is increased. But it came with performance issues (more hallucinations, etc). Also I don't think you realize how much dumb stuff is still left on the table. That the market is worth trillions is quite irrelevant here given the dynamism of the field. | ||