| ▲ | keepamovin a day ago | |
Yeah, actually I think that’s really smart. Because after you convert everything to JPEG everything is just an image that you can ask LLMs to look at. Unfortunately, I don’t have the experience with local models, but if someone wants to point me in like the right direction or send me an email to collab. | ||
| ▲ | vunderba a day ago | parent [-] | |
There are actually a few capable VL models out there that can run on even modest hardware. If you want to keep things simple and process everything locally, I’d recommend something like Qwen3 VL [1]. It’s not the fastest model, but you can just let it chew through the docs over a weekend. In my experience, it takes about 15 to 30 seconds per image, but the quality of the results is quite good if a bit verbose [2]. | ||