Remix clone Hacker News

new | show | ask | jobs Github

	▲	tecoholic 2 hours ago
		> Converts an image to a single-page PDF with a hidden text layer using Tesseract. This is the 'State Preservation' step. Does this mean the text only pdf page is transformed into an image that covers the full page, but the text is still under there. So, any machine based extraction would still get the text, but would probably loose all the bounding box information and regular users cannot just use their mouse to select text anymore?