▲ | bobosha 2 days ago | |
I’m working on a new vision-language model architecture called Onida. Our aim is to match—or surpass—the performance of leading VLMs like LLavA and CogVLM, while operating at a fraction of the cost. Unlike most existing VLMs, which layer vision components onto a language model as an afterthought, Onida is designed from first principles with a truly integrated approach. This document [1] outlines our key differentiators, and we’re now inviting beta participants to explore and test the technology. [1] https://healthio.notion.site/Onida-Efficient-VLM-Architectur... |