▲ | trevor-e 6 hours ago | |||||||||||||
I've been thinking lately that maybe we need a new AI-friendly file format rather than continuing to hack on top of PDF's complicated spec. PDF was designed to have consistent and portable page display rendering, it was not a goal for it to be easily parseable afaik, which is why we have to go through these crazy hoops. If you've ever looked at how text is stored internally in PDF this becomes immediately obvious. I've been toying with an idea of a new format that stores text naturally and captures semantics (e.g. to help with table parsing), but also preserves formatting rules so you can still achieve fairly consistent rendering. This format could be easily converted to PDF, although the opposite conversion would have the regular challenges. The main challenge is distribution of course. | ||||||||||||||
▲ | s0rce 2 hours ago | parent | next [-] | |||||||||||||
Doesn't Latex do this? | ||||||||||||||
▲ | Jaxan 5 hours ago | parent | prev [-] | |||||||||||||
Wouldn’t it be better to invest in a human-friendly format first (which also could be AI-friendly). | ||||||||||||||
|