▲ | kumarm 6 days ago | |
We do this in our Text to speech app (Read4Me): https://apps.apple.com/us/app/read4me-talk-browser-pdf-doc/i... You can scan a book and listen (also copy and paste the text extracted to other apps). If you are looking to do this on large scale in your own UI, I would recommend either of Google solutions: 1. Google Cloud Vision API (https://cloud.google.com/vision?hl=en) 2. Using Gemini API OCR capabilities.(Start here: https://aistudio.google.com/prompts/new_chat) |