There is also Unstract open-source. Structured data extraction + ETL. https://github.com/Zipstack/unstract