▲ | ggnore7452 3 days ago | |||||||||||||
I’ve done a similar PDF → Markdown workflow. For each page: - Extract text as usual. - Capture the whole page as an image (~200 DPI). - Optionally extract images/graphs within the page and include them in the same LLM call. - Optionally add a bit of context from neighboring pages. Then wrap everything with a clear prompt (structured output + how you want graphs handled), and you’re set. At this point, models like GPT-5-nano/mini or Gemini 2.5 Flash are cheap and strong enough to make this practical. Yeah, it’s a bit like using a rocket launcher on a mosquito, but this is actually very easy to implement and quite flexible and powerfuL. works across almost any format, Markdown is both AI and human friendly, and surprisingly maintainable. | ||||||||||||||
▲ | GaggiX 3 days ago | parent [-] | |||||||||||||
>are cheap and strong enough to make this practical. It all depends on the scale you need them, with the API it's easy to generate millions of tokens without thinking. | ||||||||||||||
|