| ▲ | camgunz 2 hours ago | |
I think SWEs are genuinely pretty shocked and awed that codegen models can code at all, let alone code well. My guess is a lot of the agita around this is that people thought "I can code therefore I'm smart/special/etc." and then a machine comes by that can do pretty equivalent work and they're entirely unmoored. I sympathize with that, and I don't mean to dismiss it, but that's not what I feel. I really dislike this "doer vs. maker" binary stuff that comes up every now and again, as though everyone who thinks codegen models aren't perfect doesn't want to make anything. I really want to make things--good things--and I dislike the current hype wave behind codegen models because they often make it harder for me to make good things. I've used Claude Code to build a few big things at work; I ask it questions ("where does this happen", "we have problem X, give me 3 potential causes", etc); I have it review things before I post PRs; our code review bot finds real heisenbugs. I have mixed success with all of this, but even so I find it overall useful. I'd be irritated if some place I worked, current or present, told me I couldn't use Claude Code or the like. That said, I've not gotten it to be useful in: - building entire, complex features in brownfield projects - solving systemic bugs - system design/evolution - feature/product design and planning - replacing senior engineer code review It will confidently tell you it's done these things, but when you actually force yourself through the mental slog of reviewing its output, you'll realize it's failed (you also have to be an expert to perform this analysis). Now, maybe it fails in an acceptable way; maybe only slight revision is required; maybe it one-shots the change and verifying success isn't a big mental slog. Those are the good cases. More annoying are the times it fails totally and obviously, but the real nightmares are when it fails totally, yet imperceptibly. It also sometimes can do (some of) these things! But it's inconsistent, such that its successes largely serve to lower your guard against its failures. And the mental slog is real. The artifacts you have to produce/review/ensure the model adheres to are ponderous. The code generated is ponderous. Code review is even more tedious because there's no human mind behind the code, so you can't build a mental model of the author. Getting a codegen model to revise its work or take a different approach is very hit or miss. Revising the code yourself requires reading thousands and thousands of lines of generated code--again with no human behind it--and building a mental model of what's happening before you can effectively work, and that process is time-consuming and exhausting. I'm also concerned about the second-order effects. Because switching into the often-required deep mental focus is very difficult (borderline painful), I've seen many, many people reach for LLMs in those moments instead, first a little, then entirely. I've watched people copy/paste API docs into Gemini prompts to explain them. I've watched people unable to find syntax errors in code and paste it into ChatGPT to fix it. I'm confident I'm not the only person who's observed this, and it's a little maddening it's not getting more play. --- I'm not saying SWEs don't fail in similar ways. I've approved--and authored--human PRs that had insidious flaws with real consequences. I've been asked to "review" PRs pre-ChatGPT that were 10x the size they needed to be. I've seen people plagiarize code, or just copy/paste Stack Overflow constantly. The difference is we build process around these risks, everything from coding patterns, PR size limits, type systems, firing people, borderline ludicrous amounts of unit tests, CI/CD, design docs, staging environments, red/green deploys, QA lists, etc. I hate all of it! It's a constant reminder of my flaws and it slows down mean time to dopamine squirt of released code. I'd be the first person to give all this shit the axe. I would love to point Claude at the crushingly long list of PRs I have to review. But I can't, because it still has huge, huge flaws. Code review bots miss obvious problems, and they don't have enough context/knowledge about the system/bug/feature to perform a sufficiently comprehensive review. It would be a net time waste because we'd then have to fix a bug in prod or revise an already-deployed feature/fix--things I like even less than code review, if you can believe it. These models cannot adequately replace humans in other parts of the SDLC. But, because pesky things like design and code review cap codegen models' velocity, our industry is "rethinking" it all, with no consideration of the models' flaws; "rethinking" here meaning "we're considering having an LLM handle all our code review, or not doing it at all". The only way to describe that is reckless disregard. It's unprofessional and unethical. So, I think my grief isn't about "the craft". I don't think that's gone and I don't think I'd care if it were. My grief is about the humiliation of our profession, the annihilation of our standards and the betrayal of any representation we made to our users--indeed to ourselves. We deserve software systems that do what they say they do, and up until recently I really thought we were working hard to get there. I don't think that anymore; like many other things in our era (community, truth, curiosity, generosity, trust, learning, rationality, practice, compassion) it has retreated in the face of some flavor of self-interested, shallow grift. I really don't know how or why this happened, but regardless of the cause we truly are in a dark time. | ||
| ▲ | ugtr3 an hour ago | parent [-] | |
Excellent post! “I'm also concerned about the second-order effects. Because switching into the often-required deep mental focus is very difficult (borderline painful), I've seen many, many people reach for LLMs in those moments instead, first a little, then entirely. I've watched people copy/paste API docs into Gemini prompts to explain them. I've watched people unable to find syntax errors in code and paste it into ChatGPT to fix it. I'm confident I'm not the only person who's observed this, and it's a little maddening it's not getting more play” That’s exactly why I stopped using LLM’s. Then people turn around and say “but.. you’ll get left behind.” Yeah, nah. I value my ability to hold concepts and reason deeply and sit in those painful moments - I’m not letting go of this conditioning that pays dividends over the long term. | ||