▲ | xyst 6 days ago | |
Seeing blind recommendations for AI slop is very disappointing for HN. For OP, there is a library written in rust that can do exactly what you need with very high accuracy and performant [1]. Would need to OCR dependencies to get it to work on scanned books [2]. [1] https://github.com/yobix-ai/extractous [2] https://github.com/yobix-ai/extractous?tab=readme-ov-file#-s... | ||
▲ | cess11 6 days ago | parent [-] | |
That looks rather nice, actually. Thanks. I especially like the approach to graalify Tika. |