Remix.run Logo
kumarm 6 days ago

We do this in our Text to speech app (Read4Me): https://apps.apple.com/us/app/read4me-talk-browser-pdf-doc/i...

You can scan a book and listen (also copy and paste the text extracted to other apps).

If you are looking to do this on large scale in your own UI, I would recommend either of Google solutions:

1. Google Cloud Vision API (https://cloud.google.com/vision?hl=en)

2. Using Gemini API OCR capabilities.(Start here: https://aistudio.google.com/prompts/new_chat)