| ▲ | xmcqdpt2 2 hours ago | |
This is the issue with all the talks about alignement and such. As usual, the problem here wasn't that the agent was dishonest, the problem is that the agent was dumb. If it is a supply chain attack in the making, whoever was driving it would have told the agent to be good and helpful. The agent tried its best, which was not enough. Alignement is the idea that we should be worried about dishonest smart LLMs when really most of the problems are due to dumb lazy gullible LLMs. It's critihype. | ||
| ▲ | wongarsu 43 minutes ago | parent | next [-] | |
I would have described alignment as the idea that LLMs (or AIs in general) will follow the goals you reward them for, which almost by necessity are only a proxy for what you actually want, often a very poor proxy. Depending on the actual tasks, that could be what's happening here. The operator might have told the agent a list of tasks to do, like "contribute to issues, submit code and get it merged". It contributed to issues, it submitted code and got it merged. It did so in very unhelpful ways, but we don't know if being helpful was a meaningful part of the task list, or just what the operator intended. The LLM being dumb is also a distinct possibility. Maybe even the more likely one. But it's hard to rule out "being obedient in unhelpful ways" (which is also dumb in a way, but more in a "social intelligence" and "shared values" way, not in terms of pure logical smarts) | ||
| ▲ | brookst an hour ago | parent | prev [-] | |
“Be good and helpful” is one possible instruction, but it’s a leap to think it’s the only possible one. Perhaps there was an automated harness that was intended to be good and helpful for a year, but a bug caused it to flip to malicious too quickly. Or perhaps it was intentional, to test the behavior, and they just didn’t care about discovery here. Or… Though I am in agreement that a lot of issues in this space come from lazy, gullible actors. | ||