| ▲ | Sobrino 5 hours ago | |
I worked in an AI (or well ML) consultancy before the ChatGPT moment. I remember we had a project where we had to extract a large sum of documents (country wide, terrabytes of pdfs of scans). We had to set up a pipeline that looked a bit like this. Download pdf of scan -> Tessaract to get a text layer -> Clean it up with a language specific BERT model -> detect paragraphs of a certain type -> Look them up against a database we build with scored similar paragraps -> Do recommendations. The documents were not standard and a lot of them were historical documents and handwritten or with scratched out text with corrections. We had student workers spending days labeling the data. It took us months to get it all working with a high accuracy. We were so proud. Now you can do it all with a prompt and a ChatGPT call. | ||
| ▲ | archagon 5 hours ago | parent | next [-] | |
I'm pretty sure that "a ChatGPT call" will happily add or fudge stuff in your scanned PDFs. That sounds like a massive liability. | ||
| ▲ | ok123456 5 hours ago | parent | prev [-] | |
And now you can do all of that locally with qwen3.6:35b. | ||