Remix.run Logo
mauryaudayan 3 hours ago

llamaparse also do it, what is different here?

gergelycsegzi an hour ago | parent | next [-]

Similar to my other comment, we assume that llamaparse and others can provide the individual page OCR. But once you have that the way that you can integrate it into your workflows often requires additional complexity around combining results from different sources. Here is a deeper dive I wrote on the complexities of building extraction pipelines: https://www.parsewise.ai/doc-processing-pipelines

maxhofer 2 hours ago | parent | prev [-]

Mostly cross-doc reasoning at scale (e.g., 90k-page corpora) as opposed to doc-to-markdown conversions.