| ▲ | barishnamazov 4 hours ago | |
I'm a believer that LLMs will keep getting better. But even today (which might or might not be "sufficient" training) they can easily run `rm -rf ~`. Not that humans can't make these mistakes (in fact, I have nuked my home directory myself before), but I don't think it's a specific problem some guardrails can solve currently. I'm looking for innovations (either model-wise or engineering-wise) that'd do better than letting an agent run code until a goal is seemingly achieved. | ||