▲ | Show HN: ArXiv-txt, LLM-friendly ArXiv papers(arxiv-txt.org) | |||||||||||||||||||||||||
20 points by jerpint 2 days ago | 9 comments | ||||||||||||||||||||||||||
Just change arxiv.org to arxiv-txt.org in the URL to get the paper info in markdown Example: Original URL: https://arxiv.org/abs/1706.03762 Change to: https://arxiv-txt.org/abs/1706.03762 To fetch the raw text directly, use https://arxiv-txt.org/raw/abs/1706.03762, this will be particularly useful for APIs and agents | ||||||||||||||||||||||||||
▲ | lgas a day ago | parent | next [-] | |||||||||||||||||||||||||
It just extracts the abstracts? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | sbpost a day ago | parent | prev | next [-] | |||||||||||||||||||||||||
The example you give doesn't seem to work - the raw txt does not have authors. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | jmartin2683 a day ago | parent | prev | next [-] | |||||||||||||||||||||||||
This would be awesome wrapped in an MCP server/tool call :) | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | westurner a day ago | parent | prev [-] | |||||||||||||||||||||||||
If you train an LLM on only formally verified code, it should not be expected to generate formally verified code. Similarly, if you train an LLM on only published ScholarlyArticles ['s abstracts], it should not be expected to generate publishable or true text. Traceability for Retraction would be necessary to prevent lossy feedback. |