Remix clone Hacker News

new | show | ask | jobs Github

	▲	WarmWash 2 days ago
		Even multimodal models are still really bad when it comes to vision. The strength is still definitely language.
	▲	nostrebored a day ago \| parent [-]
		Training for tasks still works petty well, but “vision” is a super broad domain and most seem optimized for OCR and screen processing (which have verifiable outputs and relatively straightforward data generation)