Remix.run Logo
stavros 3 hours ago

I'd agree, I've been building a personal assistant (https://github.com/skorokithakis/stavrobot) and I'm amazed that, for the first time ever, LLMs manage to build reliably, with much fewer bugs than I'd expect from a human, and without the repo devolving to unmaintainability after a few cycles.

It's really amazing, we've crossed a threshold, and I don't know what that means for our jobs.

Grimblewald 2 hours ago | parent [-]

No bugs means nothing if bugs get hidden and llms are great at hiding bugs and will absolutely fail to find some fairly critical ones. Your own repo, which is slop at best, fails to meet its core premise

> Another AI agent. This one is awesome, though, and very secure.

it isn't secure. It took me less than three minutes to find a vulnerability. Start engaging with your own code, it isn't as good as you think it is.

edit: i had kimi "red team" it out of curiosity, it found the main critical vulnerability i did and several others

Severity - Count - Categories

Critical - 2 - SQL Injection, Path Traversal

High - 4 - SSRF, Auth Bypass, Privilege Escalation, Secret Exposure

Medium - 3 - DoS, Information Disclosure, Injection

You need to sit down and really think about what people who do know what they're doing are saying. You're going to get yourself into deep trouble with this. I'm not a security specialist, i take a recreational interest in security, and llm's are by no means expert. A human with skill and intent would, i would gamble, be able fuck your shit up in a major way.

reedf1 30 minutes ago | parent [-]

Build a redteam into your feedback mechanism. Seriously. You've identified the problem and even solved it. Now automate it.