Remix.run Logo
lxgr 6 days ago

> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.

> Weird. I would think LLMs are exactly the right kind of tool to describe images.

TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly.

> I doubt OCR and even self-driving cars will get any significant advancements.

This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs".