Remix.run Logo
goldenarm 4 days ago

Now that StackOverflow has been killed (in part) by LLMs, how will we train future models? Will public GitHub repos be enough?

Precise troubleshooting data is getting rare, GitHub issues are the last place where it lives nowadays.

Vaslo 4 days ago | parent | next [-]

They would just use documentation. I know there is some synthesis they would lose in the training process but I’m often sending Claude through the context7 MCP to learn documentation for packages that didn’t exist, and it nearly always solves the problem for me.

nicoburns 4 days ago | parent | next [-]

The brilliance of StackOverflow was in being the place to find out how to do tricky workarounds for functionality that either wasn't documented or was buggy such that workarounds were needed to make it actually work.

Software quality is now generally a bit better than it was in 2010, but that need is ultimately still there.

robryan 4 days ago | parent [-]

Assuming these end up in open source code llms will learn about them that way.

bluedino 4 days ago | parent | prev [-]

Aren't a lot of projects using LLMs to generate documentation these days?

4 days ago | parent | prev | next [-]
[deleted]
gitaarik 4 days ago | parent | prev [-]

They pay lots of humans to train the LLMs..