▲ | lxgr 6 days ago | |
> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images. > Weird. I would think LLMs are exactly the right kind of tool to describe images. TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly. > I doubt OCR and even self-driving cars will get any significant advancements. This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs". |