Remix.run Logo
edent a day ago

About 60k academic citations about to die - https://scholar.google.com/scholar?start=90&q=%22https://goo...

Countless books with irrevocably broken references - https://www.google.com/search?q=%22://goo.gl%22&sca_upv=1&sc...

And for what? The cost of keeping a few TB online and a little bit of CPU power?

An absolute act of cultural vandalism.

toomuchtodo a day ago | parent | next [-]

https://wiki.archiveteam.org/index.php/Goo.gl

https://tracker.archiveteam.org/goo-gl/ (1.66B work items remaining as of this comment)

How to run an ArchiveTeam warrior: https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

(edit: i see jaydenmilne commented about this further down thread, mea culpa)

progbits a day ago | parent | next [-]

They appear to be doing ~37k items per minute, with 1.6B remaining that is roughly 30 days left. So that's just barely enough to do it in time.

Going to run the warrior over the weekend to help out a bit.

pentagrama a day ago | parent | prev | next [-]

Thank you for that information!

I wanted to help and did that using VMware.

For curious people, here is what the UI looks like, you have a list of projects to choose, I choose the goo.gl project, and a "Current project" tab which shows the project activity.

Project list: https://imgur.com/a/peTVzyw

Current project: https://imgur.com/a/QVuWWIj

addandsubtract 12 hours ago | parent [-]

Also available as a docker file, for those not running VMs: https://github.com/ArchiveTeam/warrior-dockerfile

xingped 8 hours ago | parent | prev [-]

For those in the now, is this heavy on disk usage? Should I install this on my disk drive or my SSD? Just want to avoid tons of disk writes on an SSD if it's unnecessary.

jlarocco a day ago | parent | prev | next [-]

IMO it's less Google's fault and more a crappy tech education problem.

It wasn't a good idea to use shortened links in a citation in the first place, and somebody should have explained that to the authors. They didn't publish a book or write an academic paper in a vacuum - somebody around them should have known better and said something.

And really it's not much different than anything else online - it can disappear on a whim. How many of those shortened links even go to valid pages any more?

And no company is going to maintain a "free" service forever. It's easy to say, "It's only ...", but you're not the one doing the work or paying for it.

justin66 a day ago | parent | next [-]

> It wasn't a good idea to use shortened links in a citation in the first place, and somebody should have explained that to the authors. They didn't publish a book or write an academic paper in a vacuum - somebody around them should have known better and said something.

It's a great idea, and today in 2025, papers are pretty much the only place where using these shortened URLs makes a lot of sense. In almost any other context you could just use a QR code or something, but that wouldn't fit an academic paper.

Their specific choice of shortened URL provider was obviously unfortunate. The real failure is that of DOI to provide an alternative to goo.gl or tinyurl or whatever that is easy to reach for. It's a big failure, since preserving references to things like academic papers is part of their stated purpose.

dingnuts a day ago | parent [-]

Even normal HTTP URLs aren't great. If there was ever a case for content-addressable networks like IPFS it's this. Universities should be able to host this data in a decentralized way.

justin66 8 hours ago | parent | next [-]

A DOI handle type of thing could certainly point to an IPFS address. I can't speak to how you'd do truly decentralized access to the DOI handle. At some point DNS is a thing and somebody needs to host the handle.

nly a day ago | parent | prev [-]

CANs usually have complex hashy URLs, so you still have the compactness problem

gmerc a day ago | parent | prev [-]

Ahh classic free market cop out.

bbuut 7 hours ago | parent | next [-]

Free market is a euphemism for “there’s no physics demanding this be worked on”

If you want it archived do it. You seem to want someone else to take up your concerns.

An HN genius should be able to crawl this and fix it.

But you’re not geniuses. They’re too busy to be low affect whiners on social media.

jlarocco 7 hours ago | parent | prev | next [-]

Well, is the free market going anywhere?

Who's lost out at the end of the day? People who didn't understand the free market and lost access to these "free" services? Or people who knew what would happen and avoided them? My links are still working...

There are digital public goods (like Wikipedia) that are intended to stick around forever with free access, but Google isn't one of them.

FallCheeta7373 a day ago | parent | prev | next [-]

if the smartest among us publishing for academia cannot figure this out, then who will?

hammyhavoc 17 hours ago | parent [-]

Not infrequently, someone being smart in one field doesn't necessarily mean they can solve problems in another.

I know some brilliant people, but, well, putting it kindly, they're as useful as a chocolate teapot outside of their specific area of academic expertise.

kazinator a day ago | parent | prev [-]

Nope! There have in fact been education campaigns about the evils of URL shorteners for years: how they pose security risks (used for shortening malicious URLs), and how they stop working when their domain is temporarily or permanently down.

The authors just had their heads too far up their academic asses to have heard of this.

epolanski a day ago | parent | prev | next [-]

Jm2c, but if your resource is a link to an online resource that's borderline already (at any point the content can be changed or disappear).

Even worse if your resource is a shortened link by some other service, you've just added yet another layer of unreliable indirection.

whatevaa a day ago | parent [-]

Citations are citations, if it's a link, you link to it. But using shorteners for that is silly.

ceejayoz a day ago | parent [-]

It's not silly if the link is a couple hundred characters long.

IanCal a day ago | parent | next [-]

Adding an external service so you don’t have to store a few hundred bytes is wild, particularly within a pdf.

ceejayoz a day ago | parent [-]

It's not the bytes.

It's the fact that it's likely gonna be printed in a paper journal, where you can't click the link.

SR2Z a day ago | parent | next [-]

I find it amusing that you are complaining about not having a computer to click a link while glossing over the fact that you need a computer to use a link at all.

This use case of "I have a paper journal and no PDF but a computer with a web browser" seems extraordinarily contrived. I have literally held a single-digit number of printed papers in my entire life while looking at thousands as PDFs. If we cared, we'd use a QR code.

This kind of luddite behavior sometimes makes using this site exhausting.

jtuple a day ago | parent | next [-]

Perhaps times have changed, but when I was in grad school circa 2010 smartphones and tablets weren't yet ubiquitous but laptops were. It was super common to sit in a cafe/library with a laptop and a stack of printed papers to comb though.

Reading paper was more comfortable then reading on the screen, and it was easy to annotate, highlight, scribble notes in the margin, doodle diagrams, etc.

Do grad students today just use tablets with a stylus instead (iPad + pencil, Remarkable Pro, etc)?

Granted, post grad school I don't print much anymore, but that's mostly due to a change in use case. At work I generally read at most 1-5 papers a day tops, which is small enough to just do on a computer screen (and have less need to annotate, etc). Quite different then the 50-100 papers/week + deep analysis expected in academia.

Incipient 21 hours ago | parent [-]

>Perhaps times have changed, but when I was in grad school circa 2010 smartphones and tablets weren't yet ubiquitous but laptops were. It was super common to sit in a cafe/library with a laptop and a stack of printed papers to comb though.

I just had a really warm feeling of nostalgia reading that! I was a pretty average student, and the material was sometimes dull, but the coffee was nice, life had little stress (in comparison) and everything felt good. I forgot about those times haha. Thanks!

ceejayoz a day ago | parent | prev | next [-]

> I have literally held a single-digit number of printed papers in my entire life while looking at thousands as PDFs.

This is by no means a universal experience.

People still get printed journals. Libraries still stock them. Some folks print out reference materials from a PDF to take to class or a meeting or whatnot.

SR2Z a day ago | parent [-]

And how many of those people then proceed to type those links into their web browsers, shortened or not?

Sure, contributing to link rot is bad, but in the same way that throwing out spoiled food is bad. Sometimes you've just gotta break a bunch of links.

ceejayoz a day ago | parent [-]

> And how many of those people then proceed to type those links into their web browsers, shortened or not?

That probably depends on the link's purpose.

"The full dataset and source code to reproduce this research can be downloaded at <url>" might be deeply interesting to someone in a few years.

epolanski a day ago | parent [-]

So he has a computer and can click.

In any case a paper should not rely on an ephemeral resource like internet links.

Have you ever tried to navigate to the errata corrige of computer science books? It's one single book, with one single link, and it's dead anyway.

JumpCrisscross a day ago | parent [-]

I’m unconvinced the researchers acted irresponsibly. If anything, a Google-shortened link looks—at first glance—more reliable than a PDF hosted god knows where.

There are always dependencies in citations. Unless a paper comes with its citations embedded, splitting hairs between why one untrustworthy provider is more untrustworthy than another is silly.

ycombinatrix a day ago | parent [-]

The Google shortened link just redirects you to the PDF hosted god knows where...

andrepd a day ago | parent | prev | next [-]

I feel like all that is beyond the point. People used goo.gl because they largely are not tech specialists and aren't really aware of link rot or of a Google decision rendering those links unaccessible.

SR2Z a day ago | parent [-]

> People used goo.gl because they largely are not tech specialists and aren't really aware of link rot or of a Google decision rendering those links unaccessible.

Anyone who is savvy enough to put a link in a document is well-aware of the fact that links don't work forever, because anyone who has ever clicked a link from a document has encountered a dead link. It's not 2005 anymore, the internet has accumulated plenty of dead links.

a day ago | parent | next [-]
[deleted]
andrepd a day ago | parent | prev [-]

Very much an xkcd.com/2501 situation

reaperducer a day ago | parent | prev [-]

This kind of luddite behavior sometimes makes using this site exhausting.

We have many paper documents from over 1,000 years ago.

The vast majority of what was on the internet 25 years ago is gone forever.

eviks 16 hours ago | parent | next [-]

What a weird comparison. Do we have the vast majority of paper documents from 1,000 years ago?

SR2Z 8 hours ago | parent [-]

We certainly have more paper documents from 1000 years ago than PDFs from 1000 years ago! Clearly that's the fault of the PDFs.

epolanski a day ago | parent | prev [-]

25?

Try going back by 6/7 years on this very website, half the links are dead.

IanCal 12 hours ago | parent | prev | next [-]

That’s an even worse reason to use a temporary redirection service. If you really need to, put in both.

leumon a day ago | parent | prev [-]

which makes url shorteners even more attractive for printed media, because you don't have to type many characters manually

epolanski a day ago | parent | prev [-]

Fix that at the presentation layer (PDFs and Word files etc support links) not the data one.

ceejayoz a day ago | parent [-]

Let me know when you figure out how to make a printed scientific journal clickable.

epolanski a day ago | parent | next [-]

Scientific journals should not rely on ephemeral data on the internet. It doesn't even matter how long the url is.

Just buy any scientific book and try to navigate to it's own errata they link in the book. It's always dead.

diatone a day ago | parent | prev [-]

Take a photo on your phone, OS recognises the link in the image, makes it clickable, done. Or, use a QR code instead

ceejayoz a day ago | parent | next [-]

https://news.ycombinator.com/item?id=9224

jeeyoungk a day ago | parent | prev | next [-]

This is the answer; turns out that non-transformed links are the most generic data format, without any "compression" - QR codes or a third-party-intermediary - needed.

13 hours ago | parent | prev [-]
[deleted]
zffr a day ago | parent | prev | next [-]

For people wanting to include URL references in things like books, what’s the right approach to take today?

I’m genuinely asking. It seems like its hard to trust that any service will remaining running for decades

toomuchtodo a day ago | parent | next [-]

https://perma.cc/

It is built for the task, and assuming worse case scenario of sunset, it would be ingested into the Wayback Machine. Note that both the Internet Archive and Cloudflare are supporting partners (bottom of page).

(https://doi.org/ is also an option, but not as accessible to a casual user; the DOI Foundation pointed me to https://www.crossref.org/ for adhoc DOI registration, although I have not had time to research further)

afandian 3 hours ago | parent | next [-]

Crossref is designed for publishing workflows. Not set up for ad hoc DOI registration. Not least because just registering a persistent identifier to redirect to an ephemeral page without arrangements for preservation and stewardship of the page doesn’t make much sense.

That’s not to say that DOIs aren’t registered for all kinds of urls. I found the likes of YouTube etc when I researched this about 10 years ago.

toomuchtodo an hour ago | parent [-]

Would you have a recommendation for an organization that can register ad hoc DOIs? I am still looking for one.

ruined a day ago | parent | prev | next [-]

perma.cc is an interesting project, thanks for sharing.

other readers may be specifically interested in their contingency plan

https://perma.cc/contingency-plan

Hyperlisk a day ago | parent | prev | next [-]

perma.cc is great. Also check out their tools if you want to get your hands dirty with your own archival process: https://tools.perma.cc/

whoahwio a day ago | parent | prev [-]

While Perma is solution specifically for this problem, and a good one at that - citing the might of the backing company is a bit ironic here

toomuchtodo a day ago | parent [-]

If Cloudflare provides the infra (thanks Cloudflare!), I am happy to have them provide the compute and network for the lookups (which, at their scale, is probably a rounding error), with the Internet Archive remaining the storage system of last resort. Is that different than the Internet Archive offering compute to provide the lookups on top of their storage system? Everything is temporary, intent is important, etc. Can always revisit the stack as long as the data exists on disk somewhere accessible.

This is distinct from Google saying "bye y'all, no more GETs for you" with no other way to access the data.

whoahwio a day ago | parent [-]

This is much better positioned for longevity than google’s URL shortener, I’m not trying to make that argument. My point is that 10-15 years ago, when Google’s URL shortener was being adopted for all these (inappropriate) uses, its use was supported by a public opinion of Google’s ‘inevitability’. For Perma, CF serves a similar function.

toomuchtodo a day ago | parent [-]

Point taken.

edent a day ago | parent | prev | next [-]

The full URl to the original page.

You aren't responsible if things go offline. No more than if a publisher stops reprinting books and the library copies all get eaten by rats.

A reader can assess the URl for trustworthiness (is it scam.biz or legitimate_news.com) look at the path to hazard a guess at the metadata and contents, and - finally - look it up in an archive.

firefax a day ago | parent | next [-]

>The full URl to the original page.

I thought that was the standard in academia? I've had reviewers chastise me when I did not use wayback machine to archive a citation and link to that since listing a "date retrieved" doesn't do jack if there's no IA copy.

Short links were usually in addition to full URLS, and more in conference presentations than the papers themselves.

afandian 2 hours ago | parent [-]

See also memento https://arxiv.org/abs/0911.1112

grapesodaaaaa a day ago | parent | prev [-]

I think this is the only real answer. Shorteners might work for things like old Twitter where characters were a premium, but I would rather see the whole URL.

We’ve learned over the years that they can be unreliable, security risks, etc.

I just don’t see a major use-case for them anymore.

danelski a day ago | parent | prev | next [-]

Real URL and save the website in the Internet Archive as it was on the date of access?

AbstractH24 11 hours ago | parent | prev [-]

What's the right approach to take for referencing anything that isn't preserved in an institution like the Library of Congress?

Say the interview of a person, a niche publication, a local pamphlet?

Maybe to certify that your article is of a certain level of credibility you need to manually preserve all the cited works yourself in an approved way.

kazinator a day ago | parent | prev | next [-]

The act of vandalism occurs when someone creates a shortened URL, not when they stop working.

djfivyvusn a day ago | parent | prev | next [-]

The vandalism was relying on Google.

toomuchtodo a day ago | parent | next [-]

You'd think people would learn. Ah, well. Hopefully we can do better from lessons learned.

api a day ago | parent | prev [-]

The web is a crap architecture for permanent references anyway. A link points to a server, not e.g. a content hash.

The simplicity of the web is one of its virtues but also leaves a lot on the table.

justinmayer a day ago | parent | prev | next [-]

In the first segment of the very first episode of the Abstractions podcast, we talked about Google killing its goo.gl URL obfuscation service and why it is such a craven abdication of responsibility. Have a listen, if you’re curious:

Overcast link to relevant chapter: https://overcast.fm/+BOOFexNLJ8/02:33

Original episode link: https://shows.arrowloop.com/@abstractions/episodes/001-the-r...

SirMaster a day ago | parent | prev | next [-]

Can't someone just go through programmatically right now and build a list of all these links and where they point to? And then put up a list somewhere that everyone can go look up if they need to?

spixy 13 hours ago | parent [-]

Yes: https://tracker.archiveteam.org/goo-gl

QuantumGood a day ago | parent | prev | next [-]

When they began offering this, their rep for ending services was already so bad I refused to consider goo.gl. Amazing for how many years now they have introduced then ended services with large user bases. Gmail being in "beta" for five years was, weirdly, to me, a sign they might stick with it.

crossroadsguy a day ago | parent | prev | next [-]

I have always struggled with this. If I buy a book I don’t want an online/URL reference in it. Put the book/author/isbn/page etc. Or refer to the magazine/newspaper/journal/issue/page/author/etc.

BobaFloutist a day ago | parent [-]

I mean preferably do both, right? The URL is better for however long it works.

SoftTalker a day ago | parent [-]

We are long, long past any notion that URLs are permanent references to anything. Better to cite with title, author, and publisher so that maybe a web search will turn it up later. The original URL will almost certainly be broken after a few years.

eviks 16 hours ago | parent | prev | next [-]

> And for what? The cost of keeping a few TB online and a little bit of CPU power?

For the immeasurable benefits of educating the public.

lubujackson 18 hours ago | parent | prev | next [-]

Truly, the most Googly of sunsets.

jeffbee a day ago | parent | prev | next [-]

While an interesting attempt at an impact statement, 90% of the results on the first two pages for me are not references to goo.gl shorteners, but are instead OCR errors or just gibberish. One of the papers is from 1981.

asdll 21 hours ago | parent | prev | next [-]

> An absolute act of cultural vandalism.

It makes me mad also, but something we have to learn the hard way is that nothing in this world is permanent. Never, ever depend on any technology to persist. Not even URLs to original hosts should be required. Inline everything.

nikanj a day ago | parent | prev | next [-]

The cost of dealing and supporting an old codebase instead of burning it all and releasing a written-from-scratch replacement next year

garyHL a day ago | parent | prev | next [-]

[dead]

bugsMarathon88 a day ago | parent | prev | next [-]

[flagged]

edent a day ago | parent | next [-]

Gosh! It is a pity Google doesn't hire any smart people who know how to build a throttling system.

Still, they're a tiny and cash-starved company so we can't expect too much of them.

acheron a day ago | parent | next [-]

Must not be any questions about that in Leetcode.

lyu07282 a day ago | parent | prev | next [-]

Its almost like as if once a company becomes this big, burning them to the ground would be better for society or something. That would be the liberal position on monopolies if they actually believed in anything.

bugsMarathon88 a day ago | parent | prev [-]

It is a business, not a charity. Adjust your expectations accordingly, or expect disappointment.

quesera a day ago | parent | prev | next [-]

Modern webservers are very, very fast on modern CPUs. I hear Google has some CPU infrastructure?

I don't know if GCP has a free tier like AWS does, but 10kQPS is likely within the capability of a free EC2 instance running nginx with a static redirect map. Maybe splurge for the one with a full GB of RAM? No problem.

bbarnett a day ago | parent [-]

You could deprecate the service, and archive the links as static html. 200bytes of text for an html redirect (not js).

You can serve immense volumes of traffic from static html. One hardware server alone could so easily do the job.

Your attack surface is also tiny without a back end interpreter.

People will chime in with redundancy, but the point is Google could stop maintaining the ingress, and still not be douches about existing urls.

But... you know, it's Google.

quesera a day ago | parent [-]

Exactly. I've seen goo.gl URLs in printed books. Obviously in old blog posts too. And in government websites. Nonprofit communications. Everywhere.

Why break this??

Sure, deprecate the service. Add no new entries. This is a good idea anyway, link shorteners are bad for the internet.

But breaking all the existing goo.gl URLs seems bizarrely hostile, and completely unnecessary. It would take so little to keep them up.

You don't even need HTML files. The full set of static redirects can be configured into the webserver. No deployment hassles. The filesystem can be RO to further reduce attack surface.

Google is acting like they are a one-person startup here.

Since they are not a one-person startup, I do wonder if we're missing the real issue. Like legal exposure, or implication in some kind of activity that they don't want to be a part of, and it's safer/simpler to just delete everything instead of trying to detect and remove all of the exposure-creating entries.

Of maybe that's what they're telling themselves, even if it's not real.

bugsMarathon88 21 hours ago | parent [-]

> Why break this??

We already told you: people are likely brute-forcing URLs.

quesera 20 hours ago | parent [-]

I'm not sure why that is a problem.

nomel a day ago | parent | prev [-]

Those numbers make it seem fairly trivial. You have a dozen bytes referencing a few hundred bytes, for a service that is not latency sensitive.

This sounds like a good project for an intern, with server costs that might be able to exceed a hundred dollars per month!

oyveybro a day ago | parent | prev [-]

[flagged]