Remix.run Logo
jimjimjim 4 hours ago

It is not by design! PDFs that are made from scanned documents or collections of images would require OCRing but that is true of any format that the scans/images are put into. These days the vast majority of PDFs do not need to be OCRed as the pages are just made up of text, line drawings and images. And although it can get tricky you can edit those text, line and image commands as much as you want.

For example: add this is in the contents stream for a pdf page and it'll put hello world on the page

  BT
    /myfont 50 Tf
    100 200 Td
    (Hello World) Tj
  ET

(Note: a bit more is required to select the font etc)