Remix clone Hacker News

new | show | ask | jobs Github

	▲	mark_l_watson 13 hours ago
		This has been my question also: I spend a lot of time experimenting with local models and almost all of my use cases involve text data, but having image processing and understanding would be useful. How much do I give up (in performance, and running on my 32G M2Pro Mac) using the VL version of a model? For MOE models, hopefully not much.
	▲	thot_experiment 5 hours ago \| parent [-]
		all the qwen flavors have a VL version and it's a separate tensor stack, just a bit of vram if you want to keep it resident and vision-based queries take longer to process context but generation is still fast asf i think the model itself is actually "smarter" because they split the thinking and instruct models so both modalities become better in their respective model i use it almost exclusively to OCR handwritten todo lists into my todo app and i don't think it's missed yet, does a great job of toolcalling everything