▲ | BeetleB a day ago | |||||||||||||||||||||||||||||||
I've noticed an interesting trend: Most people who are happy with LLM coding say something like "Wow, it's awesome. I asked it to do X and it did it so fast with minimal bugs, and good code", and occasionally show the output. Many provide even more details. Most people who are not happy with LLM coding ... provide almost no details. As someone who's impressed by LLM coding, when I read a post like yours, I tend to have a lot of questions, and generally the post doesn't have the answers. 1. What type of problem did you try it out with? 2. Which model did you use (you get points for providing that one!) 3. Did you consider a better model (compare how Gemini 2.5 Pro compares to Sonnet 3.7 on the Aider leaderboard)? 4. What were its failings? Buggy code? Correct code but poorly architected? Correct code but used some obscure method to solve it rather than a canonical one? 5. Was it working on an existing codebase or was this new code? 6. Did you manage well how many tokens were sent? Did you use a tool that informs you of the number of tokens for each query? 7. Which tool did you use? It's not just a question of the model, but of how the tool handles the prompts/agents under it. Aider is different from Code which is different from Cursor which is different form Windsurf. 8. What strategy did you follow? Did you give it the broad spec and ask it to do anything? Did you work bottom up and work incrementally? I'm not saying LLM coding is the best or can replace a human. But for certain use cases (e.g. simple script, written from scratch), it's absolutely fantastic. I (mostly) don't use it on production code, but little peripheral scripts I need to write (at home or work), it's great. And that's why people like me wonder what people like you are doing differently. But such people aren't forthcoming with the details. | ||||||||||||||||||||||||||||||||
▲ | GolDDranks a day ago | parent | next [-] | |||||||||||||||||||||||||||||||
Two problems: 1) Writing a high-performance memory allocator for a game engine in Rust: https://github.com/golddranks/bang/tree/main/libs/arena/src (Still work in progress, so it's in a bit messy state.) Didn't seem to understand the design I had in mind, and/or the requirements and goes on tangents and starts changing the design. In the end, coded the main code myself and used LLM for writing tests with some success. Had to remove tons of inane comments that didn't provide any explanatory value. 2) Trying to fix a Django ORM expression that generates unoptimal and incorrect SQL. Constantly changes opinion whether something is even possible or supported by Django, apologizes when pointing out mistakes / bugs / hallucinations, but then proceeds to not internalize the implications of the said mistakes. I used the Zed editor with its recently published agentic features. I tried to prompt it with a chat style discussion, but it often did bigger edits I would have liked, and failed to share a high-level plan in advance, something I often requested. My biggest frustrations were not coding problems per se, but just general inability to follow instructions and see implications, and lacking the awareness to step back and ask for confirmations or better directions if there are "hold on, something's not right" kind of moments. Also, generally following through with "thanks for pointing that out, you are absolutely right!" even if you are NOT right. That yes-man style seriously erodes trust in the output. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | sponnath a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I feel like the opposite is true but maybe the issue is that we both live in separate bubbles. Often times I see people on X and elsewhere making wild claims about the capabilities of AI and rarely do they link to the actual output. That said, I agree that AI has been amazing for fairly closed ended problems like writing a basic script or even writing scaffolding for tests (it's about 90% effective at producing tests I'd consider good assuming you give it enough context). Greenfield projects have been more of a miss than a hit for me. It starts out well but if you don't do a good job of directing architecture it can go off the rails pretty quickly. In a lot of cases I find it faster to write the code myself. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | fumeux_fume a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Aside from the fact that you seem to be demanding a lot from someone who's informally sharing their experience online, I think the effectiveness really depends on what you're writing code for. With straightforward use cases that have ample documented examples, you can generally expect decent or even excellent results. However, the more novel the task or the more esoteric the software library, the likelier you are to encounter issues and feel dissatisfied with the outcomes. Additionally, some people are simply pickier about code quality and won't accept suboptimal results. Where I work, I regularly encounter wildly enthusiastic opinions about GenAI that lack any supporting evidence. Dissenting from the mainstream belief that AI is transforming every industry is treated as heresy, so such skepticism is best kept close to the chest—or better yet, completely to oneself. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | raincole a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Yeah, that's obvious. It's even worse for blog posts. Pro-LLM posts usually come with the whole working toy apps and the prompts that were used to generate them. Anti-LLM posts are usually some logical puzzles with twists. Anyway that's the Internet for you. People will say LLM has been plateaued since 2022 with a straight face. | ||||||||||||||||||||||||||||||||
▲ | csomar 20 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Maybe you are not reading what we are writing :) Here is an article of mine https://omarabid.com/gpt3-now > But for certain use cases (e.g. simple script, written from scratch), it's absolutely fantastic. I agree with that. I've found it to be very useful for "yarn run xxx" scripts. Can automate lots of tasks that I wouldn't bother with previously because the cost of coding the automation vs. doing them manually was off. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | whimsicalism 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
i think these developments produce job/economic anxiety and so a certain percentage of people react this way, even higher percents on reddit where there is a lot of job anxiety | ||||||||||||||||||||||||||||||||
▲ | jiggawatts a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Reminds me of the early days of endless “ChatGPT can’t do X” comments where they were invariably using 3.5 Turbo instead of 4, which was available to paying users only. Humans are much lazier than AIs was my takeaway lesson from that. |