Remix.run Logo
dimal 4 hours ago

Perfect is the enemy of good. HTML is good enough. Let’s get this done.

And as another commenter has pointed out, HTML does exactly what you ask for. If it’s done correctly, it doesn’t contain font sizes or layout. Users can style HTML differently with custom CSS.

billconan 4 hours ago | parent [-]

mixing rendering definitions with content (PDF) is something from the printer era, that is unsuitable for the digital era.

HTML was a digital format, but it wanted to be a generic format for all document types, not just papers, so it contains a lot of extras that a paper format doesn't need.

for research papers, since they share the same structure, we can further separate content from rendering.

for example, if you want to later connect a paper with an AI, do you want to send <div class="abstract"> ... ?

or do some nasty heuristic to extract the abstract? like document. getElementsByClassName("abstract")[0] ?

simonw 4 hours ago | parent [-]

All of the interesting LLMs can handle a full paper these days without any trouble at all. I don't think it's worth spending much time optimizing for that use-case any more - that was much more important two years ago when most models topped out at 4,000 or 8,000 tokens.