| |
| ▲ | swiftcoder 6 hours ago | parent | next [-] | | The "essentially static hosting" isn't the cost centre (although with 5 million MAU, it's nothing to sneeze at). The real costs are on the input side - they have an ingestion pipeline that ensures standardised paper formatting and so on, plus at least some degree of human review. | | |
| ▲ | bonoboTP 6 hours ago | parent | next [-] | | Do you mean that the CPU compute cost of turning latex into pdf/HTML is the main cost? | | |
| ▲ | swiftcoder 6 hours ago | parent [-] | | No, I mean that the pipeline requires software engineers to build/maintain, and salaries are (as in basically every tech organisation) the dominant cost | | |
| ▲ | bonoboTP 5 hours ago | parent | next [-] | | Then drop it and make people upload a pdf and a zip of the latex sources. Most people I talk to hate that pipeline and spend a lot of debug hours on it when Arxiv can't compile what overleaf and your local latex install can. | | |
| ▲ | domoritz 4 hours ago | parent [-] | | Arxiv can recompile latex to support accessibility and html. Going to pdf submissions would be a major step backward. | | |
| ▲ | bonoboTP 3 hours ago | parent [-] | | Make it an external service then, and leave the thing that's already working great to just be. The reason authors like and use arxiv is that it gives 1) a timestamp, 2) a standardized citable ID, and 3) stable hosting of the pdf. And readers like the no-nonsense single click download of the pdf and a barebones consistent website look. All else is a side show. | | |
| ▲ | OneDeuxTriSeiGo 2 hours ago | parent [-] | | You have to keep in mind that an increasing portion of their time and labor is going towards moderation and filtering due to a mass influx of nonsensical AI generated papers, non-academic numerology-tier hackery, and other useless drivel. Spinning the service off forces other the labor out onto other universities rather than leaving them to solely Cornell | | |
| ▲ | bonoboTP an hour ago | parent [-] | | Is the problem the storage cost for hosting them, the HDDs? I'm sure they can be offloaded to cold storage because most of that slop won't be opened by anyone. Arxiv doesn't need moderation. Nobody is asking for Arxiv moderation. It needs minimal checks to remove overtly illegal content. |
|
|
|
| |
| ▲ | sayYayToLife 4 hours ago | parent | prev [-] | | [dead] |
|
| |
| ▲ | lou1306 6 hours ago | parent | prev [-] | | The PDF formatting is all but standardised. They ingest LaTeX sources, which is formatted according to the authors' whims (most likely, according to whatever journal or conference they just submitted the manuscript to).
I'll concede that the (relatively novel) HTML formatter gives paper a more uniform appearance. They also integrate a bunch of external services for e.g., citation metrics and cross-references. Still hard to justify such a high cost to operate, but eh. Also, the "human review" is a simple moderation process [1]. It usually does not dig into the submission's scientific merits. [1] https://info.arxiv.org/help/moderation/index.html |
| |
| ▲ | OtherShrezzing 5 hours ago | parent | prev [-] | | I don't see it as an especially exuberant structure or budget. I've seen larger teams with bigger budgets struggle to maintain smaller applications. I've contracted into some consultancy teams which you could uncharitably describe as "15 people and $4mn/yr to create one PDF per month". |
|