| ▲ | dvt 6 hours ago | ||||||||||||||||||||||
Using lopdf[1] for PDF parsing, rtf-parser[2] for RTF, calamine[3] for XLSX, and I'm sure you know that DOCX/PPTX/etc. is basically just a zip file of XML + text. The LLM cares about textual data (which just gets moderately cleaned up post-extraction), so I (thankfully) didn't have to deal with rendering. But showing a preview or end-result to a user would be a huge plus, so I can see myself using your library. [1] https://github.com/J-F-Liu/lopdf | |||||||||||||||||||||||
| ▲ | petilon 5 hours ago | parent [-] | ||||||||||||||||||||||
What about rendering? That's the hard part. | |||||||||||||||||||||||
| |||||||||||||||||||||||