I'm not explaining myself right.

Stockfish is a superhuman chess program. It's routinely used in chess analysis as "ground truth": if Stockfish says you've made a mistake, it's almost certain you did in fact make a mistake[0]. Also, because it's incomparably stronger than even the very best humans, sometimes the moves it suggests are extremely counterintuitive and it would be unrealistic to expect a human to find them in tournament conditions.

Obviously software development in general is way more open-ended, but if we restrict ourselves to puzzles and competitions, which are closed game-like environments, it seems plausible to me that a similar skill level could be achieved with an agent system that's RL'd to death on that task. If you have base models that can get there, even inconsistently so, and an environment where making a lot of attempts is cheap, that's the kind of setup that RL can optimize to the moon and beyond.

I don't predict the future and I'm very skeptical of anybody who claims to do so, correctly predicting the present is already hard enough, I'm just saying that given the progress we've already made I would find plausible that a system like that could be made in a few years. The details of what it would look like are beyond my pay grade.

---

[0] With caveats in endgames, closed positions and whatnot, I'm using it as an example.

▲

pclmulqdq 2 hours ago | parent [-]

Yeah, it is often pointed out as a brilliance in game analysis if a GM makes a move that an engine says is bad and turns out to be good. However, it only happens in very specific positions.

▲

emodendroket 2 hours ago | parent [-]

Does that happen because the player understands some tendency of their opponent that will cause them to not play optimally? Or is it genuinely some flaw in the machine’s analysis?

	▲	thomasahle 32 minutes ago \| parent \| next [-]
		It's only the later if it's a weak browser engine, and it's early enough in the game that the player had studied the position with a cloud engine.
	▲	pclmulqdq 2 hours ago \| parent \| prev [-]
		It can be either one. In closed positions, it is often the latter.