| ▲ | philipkglass 2 days ago | ||||||||||||||||
If you have a basic ARM MacBook, GLM-OCR is the best single model I have found for OCR with good table extraction/formatting. It's a compact 0.9b parameter model, so it'll run on systems with only 8 GB of RAM. https://github.com/zai-org/GLM-OCR Use mlx-vlm for inference: https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de... Then you can run a single command to process your PDF:
My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with
from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good. I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally. | |||||||||||||||||
| ▲ | davidbjaffe 19 hours ago | parent | next [-] | ||||||||||||||||
Cool! For GLM-OCR, do you use "Option 2: Self-host with vLLM / SGLang" and in that case, am I correct that there is no internet connection involved and hence connection timeouts would be avoided entirely? | |||||||||||||||||
| |||||||||||||||||
| ▲ | polishdude20 a day ago | parent | prev [-] | ||||||||||||||||
Thanks! Just tried it on a 40 page pdf. Seems to work for single images but the large pdf gives me connection timeouts | |||||||||||||||||
| |||||||||||||||||