| |
| ▲ | wiremine 3 days ago | parent | next [-] | | > Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative. > For me it’s meant a huge increase in productivity, at least 3X. How do we reconcile these two comments? I think that's a core question of the industry right now. My take, as a CTO, is this: we're giving people new tools, and very little training on the techniques that make those tools effective. It's sort of like we're dropping trucks and airplanes on a generation that only knows walking and bicycles. If you've never driven a truck before, you're going to crash a few times. Then it's easy to say "See, I told you, this new fangled truck is rubbish." Those who practice with the truck are going to get the hang of it, and figure out two things: 1. How to drive the truck effectively, and 2. When NOT to use the truck... when talking or the bike is actually the better way to go. We need to shift the conversation to techniques, and away from the tools. Until we do that, we're going to be forever comparing apples to oranges and talking around each other. | | |
| ▲ | weego 3 days ago | parent | next [-] | | In a similar role and place with this. My biggest take so far: If you're a disciplined coder that can handle 20% of an entire project's (project being a bug through to an entire app) time being used on research, planning and breaking those plans into phases and tasks, then augmenting your workflow with AI appears to be to have large gains in productivity. Even then you need to learn a new version of explaining it 'out loud' to get proper results. If you're more inclined to dive in and plan as you go, and store the scope of the plan in your head because "it's easier that way" then AI 'help' will just fundamentally end up in a mess of frustration. | | |
| ▲ | t0mas88 3 days ago | parent | next [-] | | For me it has a big positive impact on two sides of the spectrum and not so much in the middle. One end is larger complex new features where I spend a few days thinking about how to approach it. Usually most thought goes into how to do something complex with good performance that spans a few apps/services. I write a half page high level plan description, a set of bullets for gotchas and how to deal with them and list normal requirements. Then let Claude Code run with that. If the input is good you'll get a 90% version and then you can refactor some things or give it feedback on how to do some things more cleanly. The other end of the spectrum is "build this simple screen using this API, like these 5 other examples". It does those well because it's almost advanced autocomplete mimicking your other code. Where it doesn't do well for me is in the middle between those two. Some complexity, not a big plan and not simple enough to just repeat something existing. For those things it makes a mess or you end up writing a lot of instructions/prompt abs could have just done it yourself. | |
| ▲ | cmdli 3 days ago | parent | prev [-] | | My experience has been entirely the opposite as an IC. If I spend the time to delve into the code base to the point that I understand how it works, AI just serves as a mild improvement in writing code as opposed to implementing it normally, saving me maybe 5 minutes on a 2 hour task. On the other hand, I’ve found success when I have no idea how to do something and tell the AI to do it. In that case, the AI usually does the wrong thing but it can oftentimes reveal to me the methods used in the rest of the codebase. | | |
| ▲ | zarzavat 3 days ago | parent | next [-] | | Both modes of operation are useful. If you know how to do something, then you can give Claude the broad strokes of how you want it done and -- if you give enough detail -- hopefully it will come back with work similar to what you would have written. In this case it's saving you on the order of minutes, but those minutes add up. There is a possibility for negative time saving if it returns garbage. If you don't know how to do something then you can see if an AI has any ideas. This is where the big productivity gains are, hours or even days can become minutes if you are sufficiently clueless about something. | | |
| ▲ | bcrosby95 3 days ago | parent | next [-] | | Claude will point you in the right neighborhood but to the wrong house. So if you're completely ignorant that's cool. But recognize that its probably wrong and only a starting point. Hell, I spent 3 hours "arguing" with Claude the other day in a new domain because my intuition told me something was true. I brought out all the technical reason why it was fine but Claude kept skirting around it saying the code change was wrong. After spending extra time researching it I found out there was a technical term for it and when I brought that up Claude finally admitted defeat. It was being a persistent little fucker before then. My current hobby is writing concurrent/parallel systems. Oh god AI agents are terrible. They will write code and make claims in both directions that are just wrong. | | |
| ▲ | hebocon 3 days ago | parent | next [-] | | > After spending extra time researching it I found out there was a technical term for it and when I brought that up Claude finally admitted defeat. It was being a persistent little fucker before then. Whenever I feel like I need to write "Why aren't you listening to me?!" I know it's time for a walk and a change in strategy. It's also a good indicator that I'm changing too much at once and that my requirements are too poorly defined. | | | |
| ▲ | zarzavat 3 days ago | parent | prev [-] | | To give an example: a few days ago I needed to patch an open source library to add a single feature. This is a pathologically bad case for a human. I'm in an alien codebase, I don't know where anything is. The library is vanilla JS (ES5 even!) so the only way to know the types is to read the function definitions. If I had to accomplish this task myself, my estimate would be 1-2 days. It takes time to get read code, get orientated, understand what's going on, etc. I set Claude on the problem. Claude diligently starts grepping, it identifies the source locations where the change needs to be made. After 10 minutes it has a patch for me. Does it do exactly what I wanted it to do? No. But it does all the hard work. Now that I have the scaffolding it's easy to adapt the patch to do exactly what I need. On the other hand, yesterday I had to teach Claude that writing a loop of { writeByte(...) } is not the right way to copy a buffer. Claude clearly thought that it was being very DRY by not having to duplicate the bounds check. I remain sceptical about the vibe coders burning thousands of dollars using it in a loop. It's hardworking but stupid. |
| |
| ▲ | 3 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | hirako2000 3 days ago | parent | prev | next [-] | | The issue is that you would be not just clueless but grown naive about the correctness of what it did. Knowing what to do at least you can review. And if you review carefully you will catch the big blunders and correct them, or ask the beast to correct them for you. > Claude, please generate a safe random number. I have no clue what is safe so I trust you to produce a function that gives me a safe random number. Not every use case is sensitive, but even building pieces for entertainment, if it wipe things it shouldn't delete or drain the battery doing very inefficient operations here and there, it's junk, undesirable software. | |
| ▲ | jacobr1 3 days ago | parent | prev [-] | | An importantly the cycle time on this stuff can be much faster. Trying out different variants, and iterating through larger changes can be huge. |
| |
| ▲ | teaearlgraycold 3 days ago | parent | prev [-] | | LLMs are great at semantic searching through packages when I need to know exactly how something is implemented. If that’s a major part of your job then you’re saving a ton of time with what’s available today. |
|
| |
| ▲ | gwd 3 days ago | parent | prev | next [-] | | > How do we reconcile these two comments? I think that's a core question of the industry right now. The question is, for those people who feel like things are going faster, what's the actual velocity? A month ago I showed it a basic query of one resource I'd rewritten to use a "query builder" API. Then I showed it the "legacy" query of another resource, and asked it to do something similar. It managed to get very close on the first try, and with only a few more hours of tweaking and testing managed to get a reasonably thorough test suite to pass. I'm sure that took half the time it would have taken me to do it by hand. Fast forward to this week, when I ran across some strange bugs, and had to spend a day or two digging into the code again, and do some major revision. Pretty sure those bugs wouldn't have happened if I'd written the code myself; but even though I reviewed the code, they went under the radar, because I hadn't really understood the code as well as I thought I had. So was I faster overall? Or did I just offload some of the work to myself at an unpredictable point in the future? I don't "vibe code": I keep tight reign on the tool and review everything it's doing. | | |
| ▲ | Gigachad 3 days ago | parent | next [-] | | Pretty much. We are in an era of vibe efficiency. If programmers really did get 3x faster. Why has software not improved any faster than it always has been. | | |
| ▲ | lfowles 3 days ago | parent [-] | | Probably because we're attempting to make 3x more products |
| |
| ▲ | sarmasamosarma 3 days ago | parent | prev [-] | | [dead] |
| |
| ▲ | delegate 3 days ago | parent | prev | next [-] | | Easy. You're 3x more productive for a while and then you burn yourself out. Or lose control of the codebase, which you no longer understand after weeks of vibing (since we can only think and accumulate knowledge at 1x). Sometimes the easy way out is throwing a week of generated code away and starting over. So that 3x doesn't come for free at all, besides API costs, there's the cost of quickly accumulating tech debt which you have to pay if this is a long term project. For prototypes, it's still amazing. | | |
| ▲ | brulard 3 days ago | parent | next [-] | | You conflate efficient usage of AI with "vibing". Code can be written by AI and still follow the agreed-upon structures and rules and still can and should be thoroughly reviewed. The 3x absolutely does not come for free. But the price may have been paid in advance by learning how to use those tools best. I agree the vibe-coding mentality is going to be a major problem. But aren't all tools used well and used badly? | |
| ▲ | Aeolun 3 days ago | parent | prev [-] | | > Or lose control of the codebase, which you no longer understand after weeks of vibing (since we can only think and accumulate knowledge at 1x). I recognize this, but at the same time, I’m still better at rmembering the scope of the codebase than Claude is. If Claude gets a 1M context window, we can start sticking a general overview of the codebase in every single prompt without. |
| |
| ▲ | quikoa 3 days ago | parent | prev | next [-] | | It's not just about the programmer and his experience with AI tools. The problem domain and programming language(s) used for a particular project may have a large impact on how effective the AI can be. | | |
| ▲ | vitaflo 3 days ago | parent | next [-] | | But even on the same project with the same tools the general way a dev derives satisfaction from their work can play a big role. Some devs derive satisfaction from getting work done and care less about the code as long as it works. Others derive satisfaction from writing well architected and maintainable code. One can guess the reactions to how LLM's fit into their day to day lives for each. | |
| ▲ | wiremine 3 days ago | parent | prev [-] | | > The problem domain and programming language(s) used for a particular project may have a large impact on how effective the AI can be. 100%. Again, if we only focus on things like context windows, we're missing the important details. |
| |
| ▲ | jeremy_k 3 days ago | parent | prev | next [-] | | Well put. It really does come down to nuance. I find Claude is amazing at writing React / Typescript. I mostly let it do it's own thing and skim the results after. I have it write Storybook components so I can visually confirm things look how I want. If something isn't quite right I'll take a look and if I can spot the problem and fix it myself, I'll do that. If I can't quickly spot it, I'll write up a prompt describing what is going on and work through it with AI assistance. Overall, React / Typescript I heavily let Claude write the code. The flip side of this is my server code is Ruby on Rails. Claude helps me a lot less here because this is my primary coding background. I also have a certain way I like to write Ruby. In these scenarios I'm usually asking Claude to generate tests for code I've already written and supplying lots of examples in context so the coding style matches. If I ask Claude to write something novel in Ruby I tend to use it as more of a jumping off point. It generates, I read, I refactor to my liking. Claude is still very helpful, but I tend to do more of the code writing for Ruby. Overall, helpful for Ruby, I still write most of the code. These are the nuances I've come to find and what works best for my coding patterns. But to your point, if you tell someone "go use Claude" and they have have a preference in how to write Ruby and they see Claude generate a bunch of Ruby they don't like, they'll likely dismiss it as "This isn't useful. It took me longer to rewrite everything than just doing it myself". Which all goes to say, time using the tools whether its Cursor, Claude Code, etc (I use OpenCode) is the biggest key but figuring out how to get over the initial hump is probably the biggest hurdle. | | |
| ▲ | jorvi 3 days ago | parent | next [-] | | It is not really a nuanced take when it compares 'unassisted' coding to using a bicycle and AI-assisted coding with a truck. I put myself somewhere in the middle in terms of how great I think LLMs are for coding, but anyone that has worked with a colleague that loves LLM coding knows how horrid it is that the team has to comb through and doublecheck their commits. In that sense it would be equally nuanced to call AI-assisted development something like "pipe bomb coding". You toss out your code into the branch, and your non-AI'd colleagues have to quickly check if your code is a harmless tube of code or yet another contraption that quickly needs defusing before it blows up in everyone's face. Of course that is not nuanced either, but you get the point :) | | |
| ▲ | LinXitoW 3 days ago | parent [-] | | Oh nuanced the comparison seems also depends on whether you live in Arkansas or in Amsterdam. But I disagree that your counterexample has anything at all to do with AI coding. That very same developer was perfectly capable of committing untested crap without AI. Perfectly capable of copy pasting the first answer they found on Stack Overflow. Perfectly capable of recreating utility functions over and over because they were to lazy to check if they already exist. |
| |
| ▲ | k9294 3 days ago | parent | prev | next [-] | | For this very reason I switched for TS for backend as well. I'm not a big fun of JS but the productivity gain of having shared types between frontend and backend and the Claude code proficiency with TS is immense. | | |
| ▲ | jeremy_k 3 days ago | parent [-] | | I considered this, but I'm just too comfortable writing my server logic in Ruby on Rails (as I do that for my day job and side project). I'm super comfortable writing client side React / Typescript but whenever I look at server side Typescript code I'm like "I should understand what this is doing but I don't" haha. |
| |
| ▲ | croes 3 days ago | parent | prev [-] | | Do you only skim the results or do you audit them at some point to prevent security issues? | | |
| ▲ | jeremy_k 3 days ago | parent [-] | | What kind of security issues are you thinking about? I'm generating UI components like Selects for certain data types or Charts of data. | | |
| ▲ | dghlsakjg 3 days ago | parent | next [-] | | User input is a notoriously thorny area. If you aren't sanitizing and checking the inputs appropriately somewhere between the user and trusted code, you WILL get pwned. Rails provides default ways to avoid this, but it makes it very easy to do whatever you want with user input. Rails will not necessarily throw a warning if your AI decides that it wants to directly interpolate user input into a sql query. | | |
| ▲ | jeremy_k 3 days ago | parent [-] | | Well in this case, I am reading through everything that is generated for Rails because I want things to be done my way. For user input, I tend to validate everything with Zod before sending it off the backend which then flows through ActiveRecord. I get what you're saying that AI could write something that executes user input but with the way I'm using the tools that shouldn't happen. |
| |
| ▲ | croes 3 days ago | parent | prev [-] | | Do these components have JS, do they have npm dependencies? Since AI slopsquatting is a thing https://en.wikipedia.org/wiki/Slopsquatting | | |
| ▲ | jeremy_k 3 days ago | parent [-] | | I do not have AI install packages or do things like run Git commands for me. |
|
|
|
| |
| ▲ | troupo 3 days ago | parent | prev | next [-] | | > How do we reconcile these two comments? I think that's a core question of the industry right now. We don't. Because there's no hard data: https://dmitriid.com/everything-around-llms-is-still-magical... And when hard data of any kind does start appearing, it may actually point in a different direction: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... > We need to shift the conversation to techniques, and away from the tools. No, you're asking to shift the conversation to magical incantation which experts claim work. What we need to do is shift the conversation to measurements | |
| ▲ | chasd00 3 days ago | parent | prev | next [-] | | One thing to think about is many software devs have a very hard time with code they didn't write. I've seen many devs do a lot of work to change code to something equivalent (even with respect to performance and readability) only because it's not the way they would have done it. I could see people having a hard time using what the LLM produced without having to "fix it up" and basically re-write everything. | | |
| ▲ | jama211 3 days ago | parent [-] | | Yeah sometimes I feel like a unicorn because I don’t really care about code at all, so long as it conforms to decent standards and does what it needs to do. I honestly believe engineers often overestimate the importance of elegance in code too, to the point of not realising the slow down of a project due to overly perfect code is genuinely not worth it. | | |
| ▲ | parpfish 3 days ago | parent [-] | | i dont care if the code is elegant, i care that the code is consistent. do the same thing in the same way each time and it lets you chunk it up and skim it much easier. if there are little differences each time, you have to keep asking yourself "is it done differently here for a particular reason?" | | |
| ▲ | vanviegen 3 days ago | parent | next [-] | | Exactly! And besides that, new code being consistent with its surrounding code used to be a sign of careful craftsmanship (as opposed to spaghetti-against-the-wall style coding), giving me some confidence that the programmer may have considered at least the most important nasty edge cases. LLMs have rendered that signal mostly useless, of course. | | |
| ▲ | jama211 a day ago | parent [-] | | Ehh, in my experience if you are using an LLM in context they are better these days at conforming to the code style around it, especially if you put it in your rules that you wish it to. |
| |
| ▲ | jama211 a day ago | parent | prev [-] | | Absolutely fair, and a great method. |
|
|
| |
| ▲ | unoti 3 days ago | parent | prev | next [-] | | > Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.
> For me it’s meant a huge increase in productivity, at least 3X.
> How do we reconcile these two comments? I think that's a core question of the industry right now. Every success story with AI coding involves giving the agent enough context to succeed on a task that it can see a path to success on. And every story where it fails is a situation where it had not enough context to see a path to success on. Think about what happens with a junior software engineer: you give them a task and they either succeed or fail. If they succeed wildly, you give them a more challenging task. If they fail, you give them more guidance, more coaching, and less challenging tasks with more personal intervention from you to break it down into achievable steps. As models and tooling becomes more advanced, the place where that balance lies shifts. The trick is to ride that sweet spot of task breakdown and guidance and supervision. | | |
| ▲ | hirako2000 3 days ago | parent | next [-] | | Bold claims. From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input. In particular when details are provided, in fact. I find that with solutions likely to be well oiled in the training data, a well formulated set of *basic* requirements often leads to a zero shot, "a" perfectly valid solution. I say "a" solution because there is still this probability (seed factor) that it will not honour part of the demands. E.g, build a to-do list app for the browser, persist entries into a hashmap, no duplicate, can edit and delete, responsive design. I never recall seeing an LLM kick off C++ code out of that. But I also don't recall any LLM succeeding in all these requirements, even though there aren't that many. It may use a hash set, or even a set for persistence because it avoids duplicates out of the box. And it would even use a hash map to show it used a hashmap but as an intermediary data structure. It would be responsive, but the edit/delete buttons may not show, or may not be functional. Saving the edits may look like it worked, but did not. The comparison with junior developers is pale. Even a mediocre developer can test its and won't pretend that it works if it doesn't even execute. If a develop lies too many times it would lose trust. We forgive these machines because they are just automatons with a label on it "can make mistakes". We have no resorts to make them speak the truth, they lie by design. | | |
| ▲ | brulard 3 days ago | parent | next [-] | | > From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input. You may feel like there are all the details and no ambiguity in the prompt. But there may still be missing parts, like examples, structure, plan, or division to smaller parts (it can do that quite well if explicitly asked for). If you give too much details at once, it gets confused, but there are ways how to let the model access context as it progresses through the task. And models are just one part of the equation. Another parts may be orchestrating agent, tools, models awareness of the tools available, documentation, and maybe even human in the loop. | |
| ▲ | epolanski 3 days ago | parent | prev [-] | | > From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input. Please provide the examples, both of the problem and your input so we can double check. |
| |
| ▲ | troupo 3 days ago | parent | prev [-] | | > And every story where it fails is a situation where it had not enough context to see a path to success on. And you know that because people are actively sharing the projects, code bases, programming languages and approaches they used? Or because your gut feeling is telling you that? For me, agents failed with enough context, and with not enough context, and succeeded with context, or not enough, and succeeded and failed with and without "guidance and coaching" |
| |
| ▲ | worldsayshi 3 days ago | parent | prev | next [-] | | I think it's very much down to which kind of problem you're trying to solve. If a solution can subtly fail and it is critical that it doesn't, LLM is net negative. If a solution is easy to verify or if it is enough that it walks like a duck and quacks like one, LLM can be very useful. I've had examples of both lately. I'm very much both bullish and bearish atm. | |
| ▲ | abc_lisper 3 days ago | parent | prev | next [-] | | I doubt there is much art to getting LLM work for you, despite all the hoopla. Any competent engineer can figure that much out. The real dichotomy is this. If you are aware of the tools/APIs and the Domain, you are better off writing the code on your own, except may be shallow changes like refactorings. OTOH, if you are not familiar with the domain/tools, using a LLM gives you a huge legup by preventing you from getting stuck and providing intial momentum. | | |
| ▲ | jama211 3 days ago | parent | next [-] | | I dunno, first time I tried an LLM I was getting so annoyed because I just wanted it to go through a css file and replace all colours with variables defined in root, and it kept missing stuff and spinning and I was getting so frustrated. Then a friend told me I should instead just ask it to write a script which accomplishes that goal, and it did it perfectly in one prompt, then ran it for me, and also wrote another script to check it hadn’t missed any and ran that. At no point when it was getting f stuck initially did it suggest another approach, or complain that it was outside its context window even though it was. This is a perfect example of “knowing how to use an LLM” taking it from useless to useful. | | |
| ▲ | abc_lisper 3 days ago | parent [-] | | Which one did you use and when was this? I mean, no body gets anything working right the first time. You got to spend a few days atleast trying to understand the tool | | |
| ▲ | jama211 a day ago | parent [-] | | It’s just a simple example of how knowing how to use a tool can make all the difference, and that can be improved upon with time. I’m not sure why you’re taking umbrage with that idea. I know this style of arguing you’re going for. If I answer your questions, you’ll attack the specific model or use case I was in, or claim it was too simple/basic a use case, or some other nitpick about the specifics instead of in good faith attempting to take my point as stated. I won’t allow you to force control of the frame of the conversation by answering your questions, also because the answers wouldn’t do anything to change the spirit of my main point. | | |
| ▲ | jama211 a day ago | parent [-] | | (Inb4 “you won’t tell me because it’s a crap model or some other petty excuse” - FYI, it wasn’t) |
|
|
| |
| ▲ | badlucklottery 3 days ago | parent | prev | next [-] | | This is my experience as well. LLM currently produce pretty mediocre code. A lot of that is a "garbage in, garbage out" issue but it's just the current state of things. If the alternative is noob code or just not doing a task at all, then mediocre is great. But 90% of the time I'm working in a familiar language/domain so I can grind out better code relatively quickly and do so in a way that's cohesive with nearby code in the codebase. The main use-case I have for AI in that case is writing the trivial unit tests for me. So it's another "No Silver Bullet" technology where the problem it's fixing isn't the essential problem software engineers are facing. | |
| ▲ | brulard 3 days ago | parent | prev [-] | | I believe there IS much art in LLMs and Agents especially. Maybe you can get like 20% boost quite quickly, but there is so much room to grow it to maybe 500% long term. |
| |
| ▲ | sixothree 3 days ago | parent | prev | next [-] | | It might just be me but I feel like it excels with certain languages where other situations it falls flat. Throw a well architected and documented code base in a popular language and you can definitely feel it get I to its groove. Also giving IT tools to ensure success is just as important. MCPs can sometimes make a world of difference, especially when it needs to search you code base. | |
| ▲ | dennisy 3 days ago | parent | prev | next [-] | | Also another view is that developers below a certain level get a positive benefit and those above get a negative effect. This makes sense, as the models are an average of the code out there and some of us are above and below that average. Sorry btw I do not want to offend anyone who feels they do garner a benefit from LLMs, just wanted to drop in this idea! | | |
| ▲ | smokel 3 days ago | parent | next [-] | | My experience was exactly the opposite. Experienced developers know when the LLM goes off the rails, and are typically better at finding useful applications. Junior developers on the other hand, can let horrible solutions pass through unchecked. Then again, LLMs are improving so quickly, that the most recent ones help juniors to learn and understand things better. | |
| ▲ | rzz3 3 days ago | parent | prev | next [-] | | It’s also really good for me as a very senior engineer with serious ADHD. Sometimes I get very mentally blocked, and telling Claude Code to plan and implement a feature gives me a really valuable starting point and has a way of unblocking me. For me it’s easier to elaborate off of an existing idea or starting point and refactor than start a whole big thing from zero on my own. | |
| ▲ | parpfish 3 days ago | parent | prev | next [-] | | i don't know if anybody else has experienced this, but one of my biggest time-sucks with cursor is that it doesn't have a way for me to steer it mid-process that i'm aware of. it'll build something that fails a test, but i know how to fix the problem. i can't jump in a manually fix it or tell it what to do. i just have to watch it churn through the problem and eventually give up and throw away a 90% good solution that i knew how to fix. | | | |
| ▲ | ath3nd 3 days ago | parent | prev [-] | | That's my anecdotal experience as well! Junior devs struggle with a lot of things: - syntax - iteration over an idea - breaking down the task and verifying each step Working with a tool like Claude that gets them started quick and iterate the solution together with them helps them tremendously and educate them on best practices in the field. Contrast that with a seasoned developer with a domain experience, good command of the programming language and knowledge of the best practices and a clear vision of how the things can be implemented. They hardly need any help on those steps where the junior struggled and where the LLMs shine, maybe some quick check on the API, but that's mostly it. That's consistent with the finding of the study https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... that experienced developers' performance suffered when using an LLM. What I used as a metaphor before to describe this phenomena is training wheels: kids learning how to ride a bike can get the basics with the help and safety of the wheels, but adults that already can ride a bike don't have any use for the training wheels, and can often find restricted by them. | | |
| ▲ | epolanski 3 days ago | parent [-] | | > that experienced developers' performance suffered when using an LLM That experiment is really non significant. A bunch of OSS devs without much training in the tools used them for very little time and found it to be a net negative. | | |
| ▲ | ath3nd 3 days ago | parent [-] | | > That experiment is really non significant That's been anecdotally my experience as well, I have found juniors benefitted the most so far in professional settings with lots of time spent on learning the tools. Senior devs either negatively suffered or didn't experience an improvement. The only study so far also corroborates that anecdotal experience. We can wait for other studies that are more relevant and with larger sample sizes, but till the only folks actually trying to measure productivity experienced a negative effect so I am more inclined to believe it until other studies come along. |
|
|
| |
| ▲ | nhaehnle 3 days ago | parent | prev | next [-] | | I just find it hard to take the 3x claims at face value because actual code generation is only a small part of my job, and so Amdahl's law currently limits any productivity increase from agentic AI to well below 2x for me. (And I believe I'm fairly typical for my team. While there are more junior folks, it's not that I'm just stuck with powerpoint or something all day. Writing code is rarely the bottleneck.) So... either their job is really just churning out code (where do these jobs exist, and are there any jobs like this at all that still care about quality?) or the most generous explanation that I can think of is that people are really, really bad at self-evaluations of productivity. | |
| ▲ | jdgoesmarching 3 days ago | parent | prev | next [-] | | Agreed, and it drives me bonkers when people talk about AI coding as if it represents some a single technique, process, or tool. Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days. We don’t even fully agree on the best practices for writing code without AI. | | |
| ▲ | mh- 3 days ago | parent | next [-] | | > Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days. Older person here: they absolutely did, all over the place in the early 90s. I remember people decrying projects that moved them to computers everywhere I went. Doctors offices, auto mechanics, etc. Then later, people did the same thing about the Internet (was written with a single word capital I by 2000, having been previously written as two separate words.) https://i.imgur.com/vApWP6l.png | | | |
| ▲ | moregrist 3 days ago | parent | prev [-] | | > Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days. There were gobs of terrible road metaphors that spun out of calling the Internet the “Information Superhighway.” Gobs and gobs of them. All self-parody to anyone who knew anything. I hesitate to relate this to anything in the current AI era, but maybe the closest (and in a gallows humor/doomer kind of way) is the amount of exec speak on how many jobs will be replaced. | | |
| ▲ | porksoda 3 days ago | parent [-] | | Remember the ones who loudly proclaimed the internet to be a passing fad, not useful for normal people. All anti LLM rants taste like that to me. I get why they thought that - it was kind of crappy unless you're one who is excited about the future and prepared to bleed a bit on the edge. | | |
| ▲ | benterix 3 days ago | parent [-] | | > Remember the ones who loudly proclaimed the internet to be a passing fad, not useful for normal people. All anti LLM rants taste like that to me. For me they're very different and they sound much more the crypto-skepticism. It's not like "LLMs are worthless, there are no use cases, they should be banned" but rather "LLMs do have their use cases but they also do have inherent flaws that need to be addressed; embedding them in every product makes no sense etc.". (I mean LLMs as tech, what's happening with GenAI companies and their leaders is a completely different matter and we have every right to criticize every lie, hypocrisy and manipulation, but let's not mix up these two.) |
|
|
| |
| ▲ | bloomca 3 days ago | parent | prev | next [-] | | > 2. When NOT to use the truck... when talking or the bike is actually the better way to go. Some people write racing car code, where a truck just doesn't bring much value. Some people go into more uncharted territories, where there are no roads (so the truck will not only slow you down, it will bring a bunch of dead weight). If the road is straight, AI is wildly good. In fact, it is probably _too_ good; but it can easily miss a turn and it will take a minute to get it on track. I am curious if we'll able to fine tune LLMs to assist with less known paths. | |
| ▲ | Ianjit 3 days ago | parent | prev | next [-] | | "How do we reconcile these two comments? I think that's a core question of the industry right now." There is no correlation between developers self assessment of their productivity and their actual productivity. https://www.youtube.com/watch?v=tbDDYKRFjhk | |
| ▲ | pesfandiar 3 days ago | parent | prev | next [-] | | Your analogy would be much better with giving workers a work horse with a mind of its own. Trucks come with clear instructions and predictable behaviour. | | |
| ▲ | chasd00 3 days ago | parent [-] | | > Your analogy would be much better with giving workers a work horse with a mind of its own. i think this is a very insightful comment with respect to working with LLMs. If you've ever ridden a horse you don't really tell it to walk, run, turn left, turn right, etc you have to convince it to do those things and not be too aggravating while you're at it. With a truck simple cause and effect applies but with horse it's a negotiation. I feel like working with LLMs is like a negotiation, you have to coax out of it what you're after. |
| |
| ▲ | jf22 3 days ago | parent | prev | next [-] | | A couple of weeks isn't enough. I'm six months in using LLMs to generate 90 of my code and finally understanding the techniques and limitations. | |
| ▲ | ath3nd 3 days ago | parent | prev | next [-] | | > How do we reconcile these two comments? I think that's a core question of the industry right now. The current freshest study focusing on experienced developers showed a net negative in the productivity when using an LLM solution in their flow: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... My conclusion on this, as an ex VP of Engineering, is that good senior developers find little utility with LLMs and even them to be a nuisance/detriment, while for juniors, they can be godsend, as they help them with syntax and coax the solution out of them. It's like training wheels to a bike. A toddler might find 3x utility, while a person who actually can ride a bike well will find themselves restricted by training wheels. | |
| ▲ | jg0r3 3 days ago | parent | prev | next [-] | | Three things I've noticed as a dev whose field involves a lot of niche software development. 1. LLMs seem to benefit 'hacker-type' programmers from my experience. People who tend to approach coding problems in a very "kick the TV from different angles and see if it works" strategy. 2. There seems to be two overgeneralized types of devs in the market right now: Devs who make niche software and devs who make web apps, data pipelines, and other standard industry tools. LLMs are much better at helping with the established tool development at the moment. 3. LLMs are absolute savants at making clean-ish looking surface level tech demos in ~5 minutes, they are masters of selling "themselves" to executives. Moving a demo to a production stack? Eh, results may vary to say the least. I use LLMs extensively when they make sense for me. One fascinating thing for me is how different everyone's experience with LLMs is. Obviously there's a lot of noise out there. With AI haters and AI tech bros kind of muddying the waters with extremist takes. | |
| ▲ | pletnes 3 days ago | parent | prev | next [-] | | Being a consultant / programmer with feet on the ground, eh, hands on the keyboard: some orgs let us use some AI tools, others do not. Some projects are predominantly new code based on recent tech (React); others include maintaining legacy stuff on windows server and proprietary frameworks. AI is great on some tasks, but unavailable or ignorant about others. Some projects have sharp requirements (or at least, have requirements) whereas some require 39 out of 40 hours a week guessing at what the other meat-based intelligences actually want from us. What «programming» actually entails, differs enormously; so does AI’s relevance. | |
| ▲ | nabla9 3 days ago | parent | prev | next [-] | | I agree. I experience a productivity boost, and I believe it’s because I prevent LLMs from making
design choices or handling creative tasks. They’re best used as a "code monkey", fill in function bodies once I’ve defined them. I design the data structures, functions, and classes myself. LLMs also help with learning new libraries by providing examples, and they can even write unit tests that I manually
check. Importantly, no code I haven’t read and accepted ever gets committed. Then I see people doing things like "write an app for ....", run, hey it works! WTF? | |
| ▲ | epolanski 3 days ago | parent | prev | next [-] | | This is a very sensible point. | |
| ▲ | oceanplexian 3 days ago | parent | prev [-] | | It's pretty simple, AI is now political for a lot of people. Some folks have a vested interest in downplaying it or over hyping it rather than impartially approaching it as a tool. | | |
| ▲ | Gigachad 3 days ago | parent [-] | | It’s also just not consistent. A manager who can’t code using it to generate a react todo list thinks it’s 100x efficiency while a senior software dev working on established apps finds it a net productivity negative. AI coding tools seem to excel at demos and flop on the field so the expectation disconnect between managers and actual workers is massive. |
|
| |
| ▲ | rs186 3 days ago | parent | prev | next [-] | | 3X if not 10X if you are starting a new project with Next.js, React, Tailwind CSS for a fullstack website development, that solves an everyday problem. Yeah I just witnessed that yesterday when creating a toy project. For my company's codebase, where we use internal tools and proprietary technology, solving a problem that does not exist outside the specific domain, on a codebase of over 1000 files? No way. Even locating the correct file to edit is non trivial for a new (human) developer. | | |
| ▲ | mike_hearn 3 days ago | parent | next [-] | | My codebase has about 1500 files and is highly domain specific: it's a tool for shipping desktop apps[1] that handles all the building, packaging, signing, uploading etc for every platform on every OS simultaneously. It's written mostly in Kotlin, and to some extent uses a custom in-house build system. The rest of the build is Gradle, which is a notoriously confusing tool. The source tree also contains servers, command line tools and a custom scripting language which is used for all the scripting needs of the project [2]. The code itself is quite complex and there's lots of unusual code for munging undocumented formats, speaking undocumented protocols, doing cryptography, Mac/Windows specific APIs, and it's all built on a foundation of a custom parallel incremental build system. In other words: nightmare codebase for an LLM. Nothing like other codebases. Yet, Claude Code demolishes problems in it without a sweat. I don't know why people have different experiences but speculating a bit: 1. I wrote most of it myself and this codebase is unusually well documented and structured compared to most. All the internal APIs have full JavaDocs/KDocs, there are extensive design notes in Markdown in the source tree, the user guide is also part of the source tree. Files, classes and modules are logically named. Files are relatively small. All this means Claude can often find the right parts of the source within just a few tool uses. 2. I invested in making a good CLAUDE.md and also wrote a script to generate "map.md" files that are at the top of every module. These map files contain one-liners of what every source file contains. I used Gemini to make these due to its cheap 1M context window. If Claude does struggle to find the right code by just reading the context files or guessing, it can consult the maps to locate the right place quickly. 3. I've developed a good intuition for what it can and cannot do well. 4. I don't ask it to do big refactorings that would stress the context window. IntelliJ is for refactorings. AI is for writing code. [1] https://hydraulic.dev [2] https://hshell.hydraulic.dev/ | |
| ▲ | GenerocUsername 3 days ago | parent | prev | next [-] | | Your first week of AI usage should be crawling your codebase and generating context.md docs that can then be fed back into future prompts so that AI understands your project space, packages, apis, and code philosophy. I guarantee your internal tools are not revolutionary, they are just unrepresented in the ML model out of the box | | |
| ▲ | orra 3 days ago | parent | next [-] | | That sounds incredibly boring. Is it effective? If so I'm sure we'll see models to generate those context.md files. | | |
| ▲ | cpursley 3 days ago | parent [-] | | Yes. And way less boring than manually reading a section of a codebase to understand what is going on after being away from it for 8 months. Claude's docs and git commit writing skills are worth it for that alone. |
| |
| ▲ | blitztime 3 days ago | parent | prev | next [-] | | How do you keep the context.md updated as the code changes? | | |
| ▲ | shmoogy 3 days ago | parent [-] | | I tell Claude to update it generally but you can probably use a hook | | |
| ▲ | tombot 3 days ago | parent [-] | | This, while it has context of the current problem, just ask Claude to re-read it's own documentation and think of things to add that will help it in the future |
|
| |
| ▲ | nicce 3 days ago | parent | prev [-] | | Even then, are you even allowed to use AI in such codebase. Is some part of the code "bought", e.g. commercial compiler generated with specific license? Is pinky promise from LLM provider enough? | | |
| ▲ | GenerocUsername 3 days ago | parent [-] | | Are the resources to understand the code on a computer? Whether it's code, swagger, or a collection of sticky notes, your job is now to supply context to the AI. I am 100% convinced people who are not getting value from AI would have trouble explaining how to tie shoes to a toddler |
|
| |
| ▲ | MattGaiser 3 days ago | parent | prev | next [-] | | Yeah, anecdotally it is heavily dependent on: 1. Using a common tech. It is not as good at Vue as it is at React. 2. Using it in a standard way. To get AI to really work well, I have had to change my typical naming conventions (or specify them in detail in the instructions). | | |
| ▲ | nicce 3 days ago | parent [-] | | React also seems to be actually alias for Next.js. Models have hard time to make the difference. |
| |
| ▲ | tptacek 3 days ago | parent | prev [-] | | That's an interesting comment, because "locating the correct file to edit" was the very first thing LLMs did that was valuable to me as a developer. |
| |
| ▲ | elevatortrim 3 days ago | parent | prev | next [-] | | I think there are two broad cases where ai coding is beneficial: 1. You are a good coder but working on a new (to you) or building a new project, or working with a technology you are not familiar with. This is where AI is hugely beneficial. It does not only accelerate you, it lets you do things you could not otherwise. 2. You have spent a lot of time on engineering your context and learning what AI is good at, and using it very strategically where you know it will save time and not bother otherwise. If you are a really good coder, really familiar with the project, and mostly changing its bits and pieces rather than building new functionality, AI won’t accelerate you much. Especially if you did not invest the time to make it work well. | |
| ▲ | acedTrex 3 days ago | parent | prev | next [-] | | I have yet to get it to generate code past 10ish lines that I am willing to accept. I read stuff like this and wonder how low yall's standards are, or if you are working on projects that just do not matter in any real world sense. | | |
| ▲ | dillydogg 3 days ago | parent | next [-] | | Whenever I read comments from the people singing their praises of the technology, it's hard not to think of the study that found AI tools made developers slower in early 2025. >When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... | | |
| ▲ | logicprog 3 days ago | parent | next [-] | | Here's an in depth analysis and critique of that study by someone whose job is literally to study programmers psychologically and has experience in sociology studies: https://www.fightforthehuman.com/are-developers-slowed-down-... Basically, the study has a fuckton of methodological problems that seriously undercut the quality of its findings, and even assuming its findings are correct, if you look closer at the data, it doesn't show what it claims to show regarding developer estimations, and the story of whether it speeds up or slows down developers is actually much more nuanced and precisely mirrors what the developers themselves say in the qualitative quote questionaire, and relatively closely mirrors what the more nuanced people will say here — that it helps with things you're less familiar with, that have scope creep, etc a lot more, but is less or even negatively useful for the opposite scenarios — even in the worst case setting. Not to mention this is studying a highly specific and rare subset of developers, and they even admit it's a subset that isn't applicable to the whole. | | | |
| ▲ | mstkllah 3 days ago | parent | prev [-] | | Ah, the very extensive study with 16 developers. Bulletproof results. | | |
| ▲ | troupo 3 days ago | parent | next [-] | | Compared to "it's just a skill issue you're not prompting it correctly" crowd with literally zero actionable data? | |
| ▲ | izacus 3 days ago | parent | prev [-] | | Yeah, we should listen to the one "trust me bro" dude instead. |
|
| |
| ▲ | spicyusername 3 days ago | parent | prev | next [-] | | 4/5 times I can easily get 100s of lines output, that only needs a quick once over. 1/5 times, I spend an extra hour tangled in code it outputs that I eventually just rewrite from scratch. Definitely a massive net positive, but that 20% is extremely frustrating. | | |
| ▲ | acedTrex 3 days ago | parent | next [-] | | That is fascinating to me, i've never seen it generate that much code that is actually something i would consider correct. It's always wrong in some way. | |
| ▲ | LinXitoW 3 days ago | parent | prev [-] | | In my experience, if I have to issue more than 2 corrections, I'm better off restarting and beefing up the prompt or just doing it myself |
| |
| ▲ | djeastm 3 days ago | parent | prev [-] | | Standards are going to be as low as the market allows I think. Some industries code quality is paramount, other times its negligible and perhaps speed of development is higher priority and the code is mostly disposable. |
| |
| ▲ | nicce 3 days ago | parent | prev | next [-] | | > I build full stack web applications in node/.net/react, more importantly (I think) is that I work on a small startup and manage 3 applications myself. I think this is your answer. For example, React and JavaScript are extremely popular and aged. Are you using TypeScript and want to get most of the types or are you accepting everything that LLM gives as JavaScript? How interested you are about the code whether it is using "soon to be deprecated" functions or the most optimized loop/implementation? How about the project structure? In other cases, the more precision you need, less effective LLM is. | |
| ▲ | thanhhaimai 3 days ago | parent | prev | next [-] | | I work across the stack (frontend, backend, ML) - For FrontEnd or easy code, it's a speed up. I think it's more like 2x instead of 3x. - For my backend (hard trading algo), it has like 90% failure rate so far. There is just so much for it to reason through (balance sheet, lots, wash, etc). All agents I have tried, even on Max mode, couldn't reason through all the cases correctly. They end up thrashing back and forth. Gemini most of the time will go into the "depressed" mode on the code base. One thing I notice is that the Max mode on Cursor is not worth it for my particular use case. The problem is either easy (frontend), which means any agent can solve it, or it's hard, and Max mode can't solve it. I tend to pick the fast model over strong model. | |
| ▲ | bcrosby95 3 days ago | parent | prev | next [-] | | My current guess is it's how the programmer solves problems in their head. This isn't something we talk about much. People seem to find LLMs do well with well-spec'd features. But for me, creating a good spec doesn't take any less time than creating the code. The problem for me is the translation layer that turns the model in my head into something more concrete. As such, creating a spec for the LLM doesn't save me any time over writing the code myself. So if it's a one shot with a vague spec and that works that's cool. But if it's well spec'd to the point the LLM won't fuck it up then I may as well write it myself. | |
| ▲ | evantbyrne 3 days ago | parent | prev | next [-] | | The problem with these discussions is that almost nobody outside of the agency/contracting world seems to track their time. Self-reported data is already sketchy enough without layering on the issue of relying on distant memory of fine details. | |
| ▲ | dingnuts 3 days ago | parent | prev | next [-] | | You have small applications following extremely common patterns and using common libraries. Models are good at regurgitating patterns they've seen many times, with fuzzy find/replace translations applied. Try to build something like Kubernetes from the ground up and let us know how it goes. Or try writing a custom firmware for a device you just designed. Something like that. | |
| ▲ | andrepd 3 days ago | parent | prev | next [-] | | Self-reports are notoriously overexcited, real results are, let's say, not so stellar. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... | | |
| ▲ | logicprog 3 days ago | parent [-] | | Here's an in depth analysis and critique of that study by someone whose job is literally to study programmers psychologically and has experience in sociology studies: https://www.fightforthehuman.com/are-developers-slowed-down-... Basically, the study has a fuckton of methodological problems that seriously undercut the quality of its findings, and even assuming its findings are correct, if you look closer at the data, it doesn't show what it claims to show regarding developer estimations, and the story of whether it speeds up or slows down developers is actually much more nuanced and precisely mirrors what the developers themselves say in the qualitative quote questionaire, and relatively closely mirrors what the more nuanced people will say here — that it helps with things you're less familiar with, that have scope creep, etc a lot more, but is less or even negatively useful for the opposite scenarios — even in the worst case setting. Not to mention this is studying a highly specific and rare subset of developers, and they even admit it's a subset that isn't applicable to the whole. |
| |
| ▲ | carlhjerpe 3 days ago | parent | prev | next [-] | | I'm currently unemployed in the DevOps field (resigned and got a long vacation). I've been using various models to write various Kubernetes plug-ins abd simple automation scripts. It's been a godsend implementing things which would require too much research otherwise, my ADHD context window is smaller than Claude's. Models are VERY good at Kubernetes since they have very anal (good) documentation requirements before merging. I would say my productivity gain is unmeasurable since I can produce things I'd ADHD out of unless I've got a whip up my rear. | |
| ▲ | 3 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | epolanski 3 days ago | parent | prev | next [-] | | > Since so many claim the opposite The overwhelming majority of those claiming the opposite are a mixture of: - users with wrong expectations, such as AI's ability to do the job on its own with minimal effort from the user. They have marketers to blame. - users that have AI skill issues: they simply don't understand/know how to use the tools appropriately. I could provide countless examples from the importance of quality prompting, good guidelines, context management, and many others. They have only their laziness or lack of interest to blame. - users that are very defensive about their job/skills. Many feel threatened by AI taking their jobs or diminishing it, so their default stance is negative. They have their ego to blame. | |
| ▲ | dmitrygr 3 days ago | parent | prev | next [-] | | > For me it’s meant a huge increase in productivity, at least 3X. Quote possibly you are doing very common things that are often done and thus are in the training set a lot, the parent post is doing something more novel that forces the model to extrapolate, which they suck at. | | |
| ▲ | cambaceres 3 days ago | parent [-] | | Sure, I won’t argue against that. The more complex (and fun) parts of the applications I tend to write myself. The productivity gains are still real though. |
| |
| ▲ | byryan 3 days ago | parent | prev | next [-] | | That makes sense, especially if your building web applications that are primarily "just" CRUD operations. If a lot of the API calls follow the same pattern and the application is just a series of API calls + React UI then that seems like something an LLM would excel at. LLM's are also more proficient in TypeScript/JS/Python compared to other languages, so that helps as well. | |
| ▲ | squeaky-clean 3 days ago | parent | prev | next [-] | | I just want to point out that they only said agentic models were a negative, not AI in general. I don't know if this is what they meant, but I personally prefer to use a web or IDE AI tool and don't really like the agentic stuff compared to those. For me agentic AI would be a net positive against no-AI, but it's a net negative compared to other AI interfaces | |
| ▲ | darkmarmot 3 days ago | parent | prev | next [-] | | I work in distributed systems programming and have been horrified by the crap the AIs produce. I've found them to be quite helpful at summarizing papers and doing research, providing jumping off points. But none of the code I write can be scraped from a blog post. | |
| ▲ | qingcharles 3 days ago | parent | prev | next [-] | | On the right projects, definitely an enormous upgrade for me. Have to be judicious with it and know when it is right and when it's wrong. I think people have to figure out what those times are. For now. In the future I think a lot of the problems people are having with it will diminish. | |
| ▲ | datadrivenangel 3 days ago | parent | prev [-] | | How do you structure your applications for maintainability? |
|