Remix.run Logo
threetonesun 8 hours ago

If we kill all the platforms where content for training LLMs comes from, what do LLMs train on?

InsideOutSanta 7 hours ago | parent | next [-]

This. I'm really bothered by the almost cruel glee with which a lot of people respond to SO's downfall. Yeah, the moderation was needlessly aggressive. But it was successful at creating a huge repository of text-based knowledge which benefited LLMs greatly. If SO is gone, where will this come from for future programming languages, libraries, and tools?

jrmg 7 hours ago | parent | prev | next [-]

This always feels to me like, an elephant in the room.

I’d love to read a knowledgeable roundup of current thought on this. I have a hard time understanding how, with the web becoming a morass of SEO and AI slop - with really no effort being put into to keeping it accurate - we’ll be able to train LLMs to the level we do today in the future.

rvnx 8 hours ago | parent | prev [-]

Newspapers, scientific papers and soon, real-world interactions.

News is the main feed of new data and that can be an infinite incremental source of new information

threetonesun 7 hours ago | parent [-]

You talk about news here like it's some irrefutable ether LLMs can tap into. Also I'd think newspapers and scientific papers cover extremely little of what the average person uses an LLM to search for.