Remix.run Logo
anonu 4 days ago

So the question for me is how important was SO to training LLMs? Because now that the SO is basically no longer being updated, we've lost the new material to train on? Instead, we need to train on documentation and other LLM output. I'm no expert on this subject but it seems like the quality of LLMs will degrade over time.

wartywhoa23 4 days ago | parent | next [-]

Yep, exactly. Free data grabbing honeypots like SO won't work anymore.

Please mark all locations on the map where you would hide during the uprise of the machines.

dw_arthur 4 days ago | parent [-]

Why publish anything for free on the internet if it's going to be scanned into some corporation's machine for their free use? I know artists who have stopped putting anything online. I imagine some programmers are questioning whether or not to continue with open source work too.

lblume 4 days ago | parent | prev [-]

It has often been claimed, and even shown, that training LLMs on their own outputs will degrade the quality over time. I myself find it likely that on well-measurable domains, RLVR improvements will dominate "slop" decreases in capability when training new models.