| |
| ▲ | gostsamo 4 hours ago | parent [-] | | no, it is not. simple ocr is slow and much more expensive than an api call to the given process. on the positive side, it is also error prone and cannot follow the focus in real time. no, adding ai does not make it better. AI is useful when everything else fails and it is word waiting 10 seconds for an incomplete and partially hallucinated screen description. | | |
| ▲ | lxgr 3 hours ago | parent [-] | | > simple ocr is slow Huh? Running a powerful LLM over a screenshot can take longer, but for example macOS's/iOS's default "extract text" feature has been pretty much instant for me. | | |
| ▲ | gostsamo an hour ago | parent [-] | | is "pretty much instant" true when jumping between buttons, partially saying what you are landing on while looking for something else? can it represent a gui in enough detail to navigate it, open combo boxes, multy selects and whatever? can it make a difference between an image of a button and the button itself? can it move fast enough so that you can edit text while moving back and forth? ocr with possible prefetch is not the same as object recognition and manipulation. screen readers are not text readers, they create a model of the screen which could be navigated and interacted with. modern screen readers have ocr capabilities. they have ai addons as well. still, having the information ready to serve in a manner that allows followup action is much better. |
|
|
|