Remix.run Logo
bbor 4 days ago

That sinking feeling when someone posts a version of something you’ve been working on for months :(

Congrats to the dev regardless, if you’re in here! Looks great, love the front end especially. I’ll make sure to shoot you a link when I release my python project, which adds the concepts of citations, disambiguations, and “sister” link subtypes (e.g. “main article”, “see also”, etc), along with a few other things. It doesn’t run anywhere close to as fast as yours, tho!! 2h for processing a wiki dump is damn impressive.

Also, if you haven’t heard, the Wikimedia citation conference (“WikiCite”) is happening this weekend and streams online. Might be worth shooting this project over to them, they’d love it! https://meta.m.wikimedia.org/wiki/WikiCite_2025

graypegg 4 days ago | parent | next [-]

Just to throw it out there since you're looking to add other link subtypes in your script: https://www.wikidata.org/

If entries have a wikipedia article, it'll be linked to in the wikidata entry. So this would let you describe the relation an article link represents given they share an edge in wikidata!

For example: https://www.wikidata.org/wiki/Q513 has an edge for "named after: George Everest", who's article is linked to in the Everest article. If you could match those up, I think that could add some interesting context to the graph!

Everest -- links to (named after) --> George Everest

bbor 3 days ago | parent [-]

Oh I'm very on board; thanks for spreading the good word! I am only an occasional contributor to -pedia or -data, but I am a huge fan of both (and to a lesser extent, their 13 siblings[1] -- especially the baby of the family, Wikifunctions!).

I'm guessing you know this, but for the passerby curious about Wikipedia drama:

Wikidata was founded back in 2012 after Google bought & closed its predecessor[2] to make the now-famous "Google Knowledge Graph". It was continuing a wave of interest in knowledge graphs going back to GOFAI (the "neat"[3] approach to AI), most famously advanced by Lenat's Cyc[4] as a path to intuitive algorithms. We obviously lost that particular war to the "scruffies" for good in 2022, but the well-known problems with LLMs highlight exactly why certain, structured, efficient knowledge graphs are also needed.

The aforementioned drama is that the project to integrate Wikidata into Wikipedia's citations has basically been on pause since 2017 after a lot of arguing[5], and this weekend's scheduled discussion[6] seems passive at best. This comes simply from the fact that the "editors" of Wikipedia--the people who spend countless hours researching content for free following strict rules--don't really care about AI paradigms! Specifically, they find the concept of citing the id of a work as opposed to writing out the whole citation dangerous.

Still, Wikidata is the "fastest growing wiki project" and backs a ton of Wikipedia stuff behind the scenes, such as fancy templates for the infoboxes on the top-right of pages. We've only got 1.65B items compared to Google's AI-curated 500B facts, but I have faith that 2026 will be the year of Wikidata regardless!

After all, is a knowledge base curated with scruffy NLP models until it's incomprehensibly-big still neat? ;)

[1] https://wikimediafoundation.org/what-we-do/wikimedia-project...

[2] https://en.wikipedia.org/wiki/Freebase_(database)

[3] [WARNING: 500KB PDF] https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

[4] https://en.wikipedia.org/wiki/Cyc

[5] https://en.wikipedia.org/wiki/Wikipedia:Templates_for_discus...

[6] https://meta.wikimedia.org/wiki/WikiCite_2025/Proposals#Cite...

JohnKemeny 4 days ago | parent | prev | next [-]

If you were working this to be the first to do it, I have bad news...

One of our projects in algorithms/data structures was to do a BFS on the Wikipedia dump. In 2007.

dleeftink 4 days ago | parent | prev | next [-]

This is no zero-sum, we'd be very interested to see what you've built.

_7mza 9 hours ago | parent | prev [-]

[dead]