▲ | msgodel 4 days ago | |
Multimodal Qwen is pretty good at OCR although it's pretty slow without a GPU. For pure search you're almost certainly better off building an index of CLIP embeddings and then doing cosine similarity with a query embedding to find things. I have gigabytes of reaction images and memes I've been thinking about doing this with. |