LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

SiempreViernes 2 hours ago | parent | next [-]

As far as I can tell they trawled a big archive for sensitive information, (unsurprisingly) found some, and then didn't try to contact anyone affected before telling the world "hey, there are login credentials to be found in here".

▲

crote 2 hours ago | parent | next [-]

Don't forget giving it a fancy name in the hope that it'll go viral!

I am getting so tired of every vulnerability getting a cutesy pet name trying to pretend being the new Heartbleed / Spectre / Meltdown...

	▲	wongarsu 2 hours ago \| parent [-]
		Beats having to remember and communicate CVE numbers

▲

KeplerBoy an hour ago | parent | prev [-]

It's not like every datapoint comes with the email of the corresponding author.

▲

mseri 2 hours ago | parent | prev | next [-]

Google has a great aid to reduce the attack surface: https://github.com/google-research/arxiv-latex-cleaner

	▲	Y_Y 39 minutes ago \| parent [-]
		I use this before submission and recommend others do too. If ai was in charge of arXiv Id have it integrated as an optional part of the submission process.

▲

barthelomew an hour ago | parent | prev | next [-]

Paper LaTeX files often contain surprising details. When a paper lacks code, looking at latex source has become a part of my reproduction workflow. The comments often reveal non-trivial insights. Often, they reveal a simpler version of the methodology section (which for poor "novelty" purposes is purposely obscured via mathematical jargon).

	▲	seg_lol 13 minutes ago \| parent [-]
		Reading the LaTex equations also makes for easier (llm) translation into code rather than trying to read the pdf.

▲

kmm an hour ago | parent | prev [-]

I sort of understand the reasoning on why Arxiv prefers tex to pdf[1], even though I feel it's a bit much to make it mandatory to submit the original tex file if they detect a submitted pdf was produced from one. But I've never understood what the added value is in hosting the source publicly.

Though I have to admit, when I was still in academia, whenever I saw a beautiful figure or formatting in a preprint, I'd often try to take some inspiration from the source for my own work, occasionally learning a new neat trick or package.

1: https://info.arxiv.org/help/faq/whytex.html

	▲	irowe an hour ago \| parent [-]
		A huge value in having authors upload the original source, is it divorces the content from the presentation (mostly). That the original sources were available was sufficient for a large majority of the corpus to be automatically rendered into HTML for easier reading on many devices: https://info.arxiv.org/about/accessible_HTML.html. I don't think it would have been as simple if they had to convert PDFs.