▲ | emilburzo 4 hours ago | |
I just tested it on one of my nemeses: PDF bank statements. They're surprisingly tough to work with if you want to get clean, structured transaction data out of them. The JSON extract actually looks pretty good and seems to produce something usable in one shot, which is very good compared to all the other tools I've tried so far, but I still need to check it more in-depth. Sharing here in case someone chimes in with "hey, doofus, $magic_project already solves this." | ||
▲ | vortex_ape 3 hours ago | parent | next [-] | |
Camelot[1] worked very well for me with bank statements. Disclaimer: I'm one of the core contributors. | ||
▲ | dleeftink 3 hours ago | parent | prev [-] | |
For 'zoned' extraction, Cermine[0] may be of use as a pre-processing step. Mileage may vary as its tailored towards papers. |