since models can't reason, as you just pointed out, and need examples to do anything, and the LLM companies are abusing everyone's websites with crawlers, why aren't we generating plausible looking but non working code for the crawlers to gobble, in order to poison them?

I mean seriously, fuck everything about how the data is gathered for these things, and everything that your comment implies about them.

The models cannot infer.

The upside of my salty attitude is that hordes of vibe coders are actively doing what I just suggested -- unknowingly.

▲

fragmede 2 days ago | parent | next [-]

But the models can run tools, so wouldn't they just run the code, not get the expected output, and then exclude the bad code from their training data?

	▲	bee_rider 2 days ago \| parent [-]
		That seems like a feedback loop that’s unlikely to exist currently. I guess if intentionally plausible but bad data became a really serious problem, the loop could be created… maybe? Although it would be necessary to attribute a bit of code output back to the training data that lead to it.

▲

Imustaskforhelp 2 days ago | parent | prev [-]

For what its worth, AI already has subpar data. Atleast this is what I've heard.

I am not sure, but the cat is out of the box. I don't think we can do anything at this point.