| ▲ | billconan 4 hours ago | |
mixing rendering definitions with content (PDF) is something from the printer era, that is unsuitable for the digital era. HTML was a digital format, but it wanted to be a generic format for all document types, not just papers, so it contains a lot of extras that a paper format doesn't need. for research papers, since they share the same structure, we can further separate content from rendering. for example, if you want to later connect a paper with an AI, do you want to send <div class="abstract"> ... ? or do some nasty heuristic to extract the abstract? like document. getElementsByClassName("abstract")[0] ? | ||
| ▲ | simonw 4 hours ago | parent [-] | |
All of the interesting LLMs can handle a full paper these days without any trouble at all. I don't think it's worth spending much time optimizing for that use-case any more - that was much more important two years ago when most models topped out at 4,000 or 8,000 tokens. | ||