| ▲ | kergonath 4 hours ago | ||||||||||||||||
Tesseract does not understand layout. It’s fine for character recognition, but if I still have to pipe the output to a LLM to make sense of the layout and fix common transcription errors, I might as well use a single model. It’s also easier for a visual LLM to extract figures and tables in one pass. | |||||||||||||||||
| ▲ | chaps 4 hours ago | parent | next [-] | ||||||||||||||||
For my workflows, layout extraction has been so inconsistent that I've stopped attempting to use it. It's simpler to just throw everything into postgis and run intersection checks on size-normalized pages. | |||||||||||||||||
| |||||||||||||||||
| ▲ | fudged71 an hour ago | parent | prev [-] | ||||||||||||||||
I don't know how, but PyMuPDF4LLM is based on Tessaract and has GNN-based layout detection | |||||||||||||||||