▲ | lionkor 5 days ago | |
that's cool and all, before you get malicious code that includes prompt injections and code that never runs but looks super legit. LLMs are NOT THOROUGH. Not even remotely. I don't understand how anyone can use LLMs and not see this instantly. I have yet to see an LLM get a better failure rate than around 50% in the real world with real world expectations. Especially with code review, LLMs catch some things, miss a lot of things, and get a lot of things completely and utterly wrong. It takes someone wholly incompetent at code review to look at an LLM review and go "perfect!". Edit: Feel free to write a comment if you disagree | ||
▲ | esafak 5 days ago | parent | next [-] | |
They work better in small, well-commented code bases in popular languages. The further you stray from that the less successful they are. That's on top of the quality of your prompt, of course. | ||
▲ | jclardy 4 days ago | parent | prev | next [-] | |
> I don't understand how anyone can use LLMs and not see this instantly Because people in general are not thorough. I've been playing around with Claude Code and before that, Cursor. And both are great tools when targeted correctly. But I've also tried "Vibe" coding with them and it is obvious where people get fooled - it will build a really nice looking shell of a product that appears to be working, but then you step into using it past the surface layer and issues start to show. Most people don't look past the surface layer, and instead keep digging in having the agent build on the crappy foundation, until some time later it all falls apart (And since a lot of these people aren't developers, they have also never heard of source control.) | ||
▲ | resonious 5 days ago | parent | prev | next [-] | |
If you know that LLMs are not thorough going into it, then you can get your failure rates way lower than 50%. Of course if you just paste a product spec into an LLM, it will do a bad job. If you build an intuition for what kinds of asks an LLM (agent, really) can do well, you can choose to only give it those tasks, and that's where the huge speedups come from. Don't know what to do about prompt injection, really. But "untrusted code" in the broader sense has always been a risk. If I download and use a library, the author already has free reign of my computer - they don't even need to think about messing with my LLM assistant. | ||
▲ | stpedgwdgfhgdd 5 days ago | parent | prev [-] | |
My suggestion is to try CC, use a language like Go, and read their blogs how they use it internally. They are transparent what works and what does not work. |