Remix.run Logo
jimmySixDOF 9 hours ago

Docling from IBM and Markitdown from Microsoft are reasonably reliable if you didn't try them also take the extra step to get image summaries in plain text from a VLM it's useful of you want to feed final results to an LLM later. Or first try to skip all that with jina.reader or firecrawl llmstxt they will extract directly from the website so simple but sometimes it works sometimes it doesn't.