All these new tools are so exciting, but running untrusted code which auto-updates itself is blocking me from trying these tools.

I wish for a vetting tool. Have an LLM examine the code then write a spec of what it reads and writes, & you can examine that before running it. If something in the list is suspect.. you’ll know before you’re hosed not after :)

▲

nothrabannosir 5 days ago | parent | next [-]

Throwing more llm at a prompt escaper is like throwing more regexp at a html parser.

If the first llm wasn’t enough, the second won’t be either. You’re in the wrong layer.

▲

scroogey 5 days ago | parent | next [-]

Here's an alternative perspective: https://x.com/rauchg/status/1949197451900158444

Not a professional developer (though Guillermo certainly is) so take this with a huge grain of salt, but I like the idea of an AI "trained" on security vulnerabilities as a second, third and fourth set of eyes!

	▲	aprilthird2021 5 days ago \| parent \| next [-]
		You p much just linked to an ad for a vibe coding platform. If you don't know what you're doing, you are going to make more security mistakes. Throwing LLMs into it doesn't increase your "know what you're doing" meter.
	▲	ffsm8 5 days ago \| parent \| prev [-]
		I'm not sure how to take that seriously with the current reality where almost all security findings by LLM tools are false positives While I suspect that's gonna work good enough on synthetic examples for naive and uninformed people to get tricked into trusting it... At the very least, current LLMs are unable to provide enough stability for this to be useful. It might become viable with future models, but there is little value in discussing this approach currently. At least until someone actually made a PoC thats at least somewhat working as designed, without having a 50-100% false positive quota. You can have some false positives, but it has to be low enough for people to still listen to it, which currently isn't the case.

▲

mathgeek 4 days ago | parent | prev [-]

While I agree with the idea of vetting things, I too get a chuckle when folks jump straight from "we can't trust this unknown code" to "let's trust AI to vet it for us". Done it myself.

▲

troupo 5 days ago | parent | prev | next [-]

> All these new tools are so exciting,

Most of these tools are not that exciting. These are similar-looking TUIs around third-paty models/LLM calls.

What is the difference between this, and https://opencode.ai? Or any of the half a dozen tools that appeared on HN in the past few weeks?

▲

lionkor 5 days ago | parent | prev | next [-]

that's cool and all, before you get malicious code that includes prompt injections and code that never runs but looks super legit.

LLMs are NOT THOROUGH. Not even remotely. I don't understand how anyone can use LLMs and not see this instantly. I have yet to see an LLM get a better failure rate than around 50% in the real world with real world expectations.

Especially with code review, LLMs catch some things, miss a lot of things, and get a lot of things completely and utterly wrong. It takes someone wholly incompetent at code review to look at an LLM review and go "perfect!".

Edit: Feel free to write a comment if you disagree

	▲	esafak 5 days ago \| parent \| next [-]
		They work better in small, well-commented code bases in popular languages. The further you stray from that the less successful they are. That's on top of the quality of your prompt, of course.
	▲	jclardy 4 days ago \| parent \| prev \| next [-]
		> I don't understand how anyone can use LLMs and not see this instantly Because people in general are not thorough. I've been playing around with Claude Code and before that, Cursor. And both are great tools when targeted correctly. But I've also tried "Vibe" coding with them and it is obvious where people get fooled - it will build a really nice looking shell of a product that appears to be working, but then you step into using it past the surface layer and issues start to show. Most people don't look past the surface layer, and instead keep digging in having the agent build on the crappy foundation, until some time later it all falls apart (And since a lot of these people aren't developers, they have also never heard of source control.)
	▲	resonious 5 days ago \| parent \| prev \| next [-]
		If you know that LLMs are not thorough going into it, then you can get your failure rates way lower than 50%. Of course if you just paste a product spec into an LLM, it will do a bad job. If you build an intuition for what kinds of asks an LLM (agent, really) can do well, you can choose to only give it those tasks, and that's where the huge speedups come from. Don't know what to do about prompt injection, really. But "untrusted code" in the broader sense has always been a risk. If I download and use a library, the author already has free reign of my computer - they don't even need to think about messing with my LLM assistant.
	▲	stpedgwdgfhgdd 5 days ago \| parent \| prev [-]
		My suggestion is to try CC, use a language like Go, and read their blogs how they use it internally. They are transparent what works and what does not work.

▲

Eggpants 4 days ago | parent | prev | next [-]

You can always chroot the directory you're using to isolate the tools from the rest of your system. That is unless your using a toy operating system of course. ;)

▲

adastra22 5 days ago | parent | prev [-]

Put it in a docker instance with a mounted git worktree?

▲

dimava 5 days ago | parent [-]

Aka VSCode DevContainer?

Could work I think (be wary of sending .env to the web though)

	▲	adastra22 4 days ago \| parent [-]
		One way of doing it, yes. Why would your dev repo have any credentials in .env?