Remix clone Hacker News

new | show | ask | jobs Github

	▲	easterncalculus 10 hours ago
		>weather predictions wrong >OCR less accurate and efficient than existing solutions, only measures well against other LLMs >tts, stt worse >language translation maybe
	▲	logicprog 6 hours ago \| parent [-]
		>>weather predictions >wrong https://www.noaa.gov/news-release/noaa-deploys-new-generatio... ? >>OCR >less accurate and efficient than existing solutions, only measures well against other LLMs Where did you hear that? On every benchmark that I've ever seen, VLM's are hilariously better than traditional OCR. Typically, the reason that language models are only compared to other language models on model cards for OCR and so on is precisely because VLM's are so much better than traditional OCR that it's not even worth comparing. Not to mention that those top of the line traditional OCR systems like AWS, Textract are themselves extremely slow and computationally expensive. Not to mention much more complex to maintain. >>tts, stt > worse Literally the first and only usable speech-to-text system that I've gotten on my phone is explicitly based on a large language model. Not to mention stuff like Whisper, Whisper X, Parakeet, all of the state-of-the-art speech-to-text systems are large-language model based and are significantly faster and better than what we had before. Likewise for text-to-speech, you know, even Kokoro-82M is faster and better than what we had before, and again, it's based on the same technology.