Remix.run Logo
vr46 7 days ago

I’ll have to test this against my local Python pipeline which does all this without an LLM in attendance. There are a ton of existing Python libraries which have been doing this for a long time, so let’s take a look..

thegabriele 7 days ago | parent [-]

Care to share the best ones for some use cases? Thanks

vr46 7 days ago | parent [-]

MinerU

PDFQuery

PyMuPDF (having more success with older versions, right now)