Remix.run Logo
msgodel 4 days ago

Multimodal Qwen is pretty good at OCR although it's pretty slow without a GPU.

For pure search you're almost certainly better off building an index of CLIP embeddings and then doing cosine similarity with a query embedding to find things. I have gigabytes of reaction images and memes I've been thinking about doing this with.