I recently did this for Indo-Aryan languages (800+ PDFs containing scanned inages). I used Google Gemini,