| ▲ | stavros 3 hours ago | |||||||
I'd agree, I've been building a personal assistant (https://github.com/skorokithakis/stavrobot) and I'm amazed that, for the first time ever, LLMs manage to build reliably, with much fewer bugs than I'd expect from a human, and without the repo devolving to unmaintainability after a few cycles. It's really amazing, we've crossed a threshold, and I don't know what that means for our jobs. | ||||||||
| ▲ | Grimblewald 2 hours ago | parent [-] | |||||||
No bugs means nothing if bugs get hidden and llms are great at hiding bugs and will absolutely fail to find some fairly critical ones. Your own repo, which is slop at best, fails to meet its core premise > Another AI agent. This one is awesome, though, and very secure. it isn't secure. It took me less than three minutes to find a vulnerability. Start engaging with your own code, it isn't as good as you think it is. edit: i had kimi "red team" it out of curiosity, it found the main critical vulnerability i did and several others Severity - Count - Categories Critical - 2 - SQL Injection, Path Traversal High - 4 - SSRF, Auth Bypass, Privilege Escalation, Secret Exposure Medium - 3 - DoS, Information Disclosure, Injection You need to sit down and really think about what people who do know what they're doing are saying. You're going to get yourself into deep trouble with this. I'm not a security specialist, i take a recreational interest in security, and llm's are by no means expert. A human with skill and intent would, i would gamble, be able fuck your shit up in a major way. | ||||||||
| ||||||||