▲ | krisoft 6 days ago | |||||||||||||||||||||||||||||||||||||||||||
> The blind and visually impaired people advocating for this have been conditioned to believe that technology will solve all accessibility problems because, simply put, humans won’t do it. Technology is not just sprouting out of the ground out of its own. It is humans who are making it. Therefore if technology is helpful it was humans who helped. > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images. Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach. > I fully predict that blind people will be advocating to make actual LLM platforms accessible Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that. > I also predict web accessibility will actually get worse, not better, as coding models will spit out inaccessible code that developers won’t check or won’t even care to check. Who knows. Either that, or some pages will become more accessible because the effort of making it accessible will be less on the part of the devs. It probably will be a mixed bag with a little bit of column A and column B. > Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements. These are all AI. They are all improving leaps and bounds. > An LLM will always be there, well, until the servers go down Of course. That is a concern. This is why models you can run yourself are so important. Local models are good for latency and reliability. But even if the model is run on a remote server as long as you control the server you can decide when it becomes shut down. | ||||||||||||||||||||||||||||||||||||||||||||
▲ | lxgr 6 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
> > Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images. > Weird. I would think LLMs are exactly the right kind of tool to describe images. TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly. > I doubt OCR and even self-driving cars will get any significant advancements. This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs". | ||||||||||||||||||||||||||||||||||||||||||||
▲ | stinkbeetle 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
> > I fully predict that blind people will be advocating to make actual LLM platforms accessible > Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that. AIs I have used have fairly basic interfaces - input some text or an image and get back some text or an image - is that not something that accessibility tools can already do? Or do they mean something else by "actual LLM platform"? This isn't a rhetorical question, I don't know much about interfaces for the blind. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
▲ | NoahZuniga 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Gemini 2.5 has the best vision understanding of any model I've worked with. Leagues beyond gpt5/o4 | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
▲ | jibal 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
> > Let’s not mention the fact the ==> particular <== large language model, LLM called ==> Chat GPT <== they chose, was never the right kind of machine learning for the task of describing images. > Weird. I would think LLMs are exactly the right kind of tool to describe images. | ||||||||||||||||||||||||||||||||||||||||||||
▲ | giancarlostoro 6 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
> Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach. Not sure but the Grok avatars or characters, whatever, I've experimented with them, though I hate the defaults that xAI made, because they seem to not be generic simple AI robot or w/e after you tell them to stop flirting and calling you babe (seriously what the heck lol) they can really hold a conversation. I talked to it about a musician I liked, very niche genre of music, and they were able to provide an insanely accurately relatable song from a different artist I did not know, all in real time. I think it was last year or the year before? They did a demo where they had two phones, one could see, one could not, and the two ChatGPT instances were talking to each other, one was describing the room to the other. I think we are probably there by now to where you can describe a room. |