| ▲ | slazien a day ago | |
"We compiled these Epstein estate emails from the House Oversight Committee release by converting the PDFs to structured text with an LLM" and: "Data Sources
TechnologyDocument parsing and extraction powered by reducto" | ||
| ▲ | dvrp 19 hours ago | parent [-] | |
Yes, also many were PPM images (or encoded as such) in PDFs and then I used (cheap/light) multimodal LLMs to classify documents from photos. It was surprisingly cheap: <$1 for a few thousand PDFs / Images. | ||