Remix.run Logo
tyre a day ago

For anyone needing to do this, the answer is to convert it to an image first. Far smaller, LLMs work well with them (even in some pretty insane use cases I've seen), and, along with human review, it can be a huge productivity gain that results in structured data.

Snoddas an hour ago | parent | next [-]

Since I'm almost never interested in the formatting I run all pdf files through pdftotext from the Poppler library before llm use.

spindump8930 a day ago | parent | prev [-]

I agree with your recomendation, but converting a pdf to an image is by no means smaller. PDFs are much closer to SVGs then to jpegs.

butlike 10 hours ago | parent [-]

Why can't I just take a screenshot of the PDF and feed that into the llm?