| |
| ▲ | gibsonsmog 2 hours ago | parent | next [-] | | I just cracked open osx voice over for the first time in a while and hoo boy, you weren't kidding. I wonder if you could still "stun" an LLM with this technique while also using some aria-* tags so the original text isn't so incredibly hostile to screen readers. Regardless I think as neat as this tool is, it's an awful pattern and hopefully no one uses it except as part of bot capture stuff. | |
| ▲ | lxgr 8 hours ago | parent | prev [-] | | Do screen readers fall back to OCR by now? I could imagine that being critical based on the large amount of text in raster images (often used for bad reasons) on the Internet alone. | | |
| ▲ | gostsamo 7 hours ago | parent [-] | | no, but they have handling of unknown symbols and either read allowed a substitute or read the text letter by letter. both suck. | | |
| ▲ | lxgr 6 hours ago | parent [-] | | Sounds like a potentially useful improvement then. I've had more success exporting text from some PDFs (not scanned pages, but just text typeset using some extremely cursed process that breaks accessibility) that way than via "normal" PDF-to-text methods. | | |
| ▲ | gostsamo 4 hours ago | parent [-] | | no, it is not. simple ocr is slow and much more expensive than an api call to the given process. on the positive side, it is also error prone and cannot follow the focus in real time. no, adding ai does not make it better. AI is useful when everything else fails and it is word waiting 10 seconds for an incomplete and partially hallucinated screen description. | | |
| ▲ | lxgr 3 hours ago | parent [-] | | > simple ocr is slow Huh? Running a powerful LLM over a screenshot can take longer, but for example macOS's/iOS's default "extract text" feature has been pretty much instant for me. | | |
| ▲ | gostsamo an hour ago | parent [-] | | is "pretty much instant" true when jumping between buttons, partially saying what you are landing on while looking for something else? can it represent a gui in enough detail to navigate it, open combo boxes, multy selects and whatever? can it make a difference between an image of a button and the button itself? can it move fast enough so that you can edit text while moving back and forth? ocr with possible prefetch is not the same as object recognition and manipulation. screen readers are not text readers, they create a model of the screen which could be navigated and interacted with. modern screen readers have ocr capabilities. they have ai addons as well. still, having the information ready to serve in a manner that allows followup action is much better. |
|
|
|
|
|
|