| ▲ | aguimaraes1986 13 hours ago |
| This is the "lines of code per week" metric from the 90s, repackaged. "I'm doing more PRs" is not evidence that AI is working, it's evidence that you are merging more.
Whether thats good depends entirely on what you are merging.
I use AI every day too. But treating throughput of code going to production as a success metric, without any mention of quality, bugs, or maintenance burden is exactly the kind of thinking developers used to push back on when management proposed it. Turns out we weren't opposed to bad metrics! We were just opposed to being measured!
Given the chance to pick our own, we jumped straight to the same nonsense. |
|
| ▲ | conwy 9 hours ago | parent | next [-] |
| FWIW, I've been using AI, but instead of "max # of lines/commits", I'm optimising for "min # of pr comments/iterations/bugs". My goal is to end up with less/simpler code and more/bigger impact. The real goal is business value, and ultimately human value. Optimise for that, using AI where it fits. Along those lines, some techniques I've been dabbling in:
1. Getting multiple agents to implement a requirement from scratch, them combining the best ideas from all of them with my own informed approach.
2. Gathering documentation (requirements, background info, glossaries, etc), targeting an Agent at it, and asking carefully selected questions for which the answers are likely useful.
3. Getting agents to review my code, abstracting review comments I agree with to a re-usable checklist of general guidelines, then using those guidelines to inform the agents in subsequent code reviews. Over time I hope this will make the code reviews increasingly well fitted to the code base and nature of the problems I work on. |
| |
| ▲ | kaashif 3 hours ago | parent [-] | | The Goodhart's law effect there seems obvious - rather than code getting better, you might just become less rigorous in your reviews and stop commenting as much. You may not even realize your standards are dropping. |
|
|
| ▲ | scorpioxy 12 hours ago | parent | prev | next [-] |
| And the author has a blog post about burnout and anxiety. Maybe all of those things are related. Working to the point of making yourself sick should not be seen as a mark of pride, it is a sign that something is broken. Not necessarily the individual, maybe the system the individual is in. |
| |
| ▲ | cocoa19 11 hours ago | parent [-] | | I’m glad I’m not the only one that noticed this is madness. I find it crazy to build a complex system to juggle 10 different threads in your brain, including the complexity of the tool itself. |
|
|
| ▲ | browningstreet 12 hours ago | parent | prev | next [-] |
| Maybe author knows that too, but wants to talk about it nonetheless. First line of article: “Commits are a terrible metric for output, but they're the most visible signal I have.” |
| |
| ▲ | the_arun 10 hours ago | parent | next [-] | | Using AI we can make 1000s of commits per day. This metric becomes even more pointless in the days of AI. If we increase sales, New subscription count, reduced bug count, reduced incidents etc., those can be real metrics. I'm sure I am preaching to the choir. | | |
| ▲ | tecleandor 10 hours ago | parent [-] | | I have coworkers commiting tens or hundreds of thousands of "lines of code" a week, because they'll push whatever the AI gives them, including dependencies and virtualenvs, without any review. Of course, at the same time we're getting dozens of alerts a week about services deployed open to the Internet without authentication and full of outdated vulnerable libraries (LLMs will happily add two or three years old dependencies to your lockfiles). | | |
| ▲ | duskdozer 5 hours ago | parent [-] | | Set the AIs off on those alerts and look at how many more alerts per week are now getting solved due to AI! | | |
|
| |
| ▲ | skydhash 12 hours ago | parent | prev [-] | | What about number of working features or system completeness? Current state vs desired state is fairly visible. | | |
| ▲ | 101011 12 hours ago | parent | next [-] | | how do you define system completeness? what if you ship one really big feature vs three really small ones? I would posit that you need extra context to obtain meaning from those metrics, which inherently makes them less visible | | |
| ▲ | skydhash 11 hours ago | parent [-] | | System completeness can be defined from the product definition. The latter is where requirements and definitions of done come from. Working features are the most important thing and most principles and techniques were about reducing the cost to get there. |
| |
| ▲ | 12 hours ago | parent | prev [-] | | [deleted] |
|
|
|
| ▲ | williamcotton 12 hours ago | parent | prev | next [-] |
| Lines of code are meaningful when taken in aggregate and useless as a metric for an individual’s contributions. COCOMO, which considers lines of code, is generally accepted as being accurate (enough) at estimating the value of a software system, at least as far as how courts (in the US) are concerned. https://en.wikipedia.org/wiki/COCOMO |
| |
| ▲ | sarchertech 12 hours ago | parent | next [-] | | No one has any idea how to estimate software value, so the idea that some courts in the US have used a wildly inaccurate system that considers LOC is so far away from evidence that LOC is useful for anything that I can’t believe you bothered including that. LOC is essentially only useful to give a ballpark estimate it complexity and even then only if you compare orders of magnitude and only between similar program languages and ecosystems. It’s certainly not useful for AI generated projects. Just look at OpenClaw. Last I heard it was something close to half a million lines of code. When I was in college we had a professor senior year who was obsessed with COCOMO. He required our final group project to be 50k LOC (He also required that we print out every line and turn it in). We made it, but only because we build a generator for the UI and made sure the generator was as verbose as possible. | |
| ▲ | post-it 12 hours ago | parent | prev | next [-] | | I think that's a "looking under the lamp post because that's where the light is" metric. I'm not sure most developers, managers, or owners care about the calculated dollar value of their codebase. They're not trading code on an exchange. By condensing all software into a scalar, you're losing almost all important information. I can see why it's important in court, obviously, since civil court is built around condensing everything into a scalar. | |
| ▲ | jonahx 12 hours ago | parent | prev | next [-] | | > Lines of code are meaningful when taken in aggregate The linked article does not demonstrate this. It establishes no causal link. One can obviously bloat LOC to an arbitrary degree while maintaining feature parity. Very generously, assuming good faith participants, it might reflect a kind average human efficiency within the fixed environment of the time. Carrying the conclusions of this study from the 80s into the LLM age is not justified scientifically. | |
| ▲ | kqr 2 hours ago | parent | prev | next [-] | | COCOMO estimates the cost of the software, not the value. The cost is only weakly correlated with value. | |
| ▲ | keeda 12 hours ago | parent | prev | next [-] | | > Lines of code are meaningful when taken in aggregate and useless as a metric for an individual’s contributions. Yes, and in fact a lot of the studies that show the impact of AI on coding productivity get dismissed because they use LoC or PRs as a metric and "everyone knows LoC/PR counts is a BS metric." But the better designed of these studies specifically call this out and explicitly design their experiments to use these as aggregate metrics. | |
| ▲ | renegade-otter 4 hours ago | parent | prev | next [-] | | I am writing a book! I used AI to write 1 billion words this morning! | |
| ▲ | BoorishBears 12 hours ago | parent | prev [-] | | > at least as far as how courts (in the US) are concerned. That's an anti-signal if we're being honest. |
|
|
| ▲ | SOLAR_FIELDS 5 hours ago | parent | prev | next [-] |
| To me commit volume and similar metrics are something that indicate ai adoption, nothing more. And for a lot of people right now that is the goal - however short or long sighted that it might be. |
|
| ▲ | zahlman 13 hours ago | parent | prev | next [-] |
| > Turns out we weren't opposed to bad metrics! We were just opposed to being measured! Given the chance to pick our own, we jumped straight to the same nonsense. This seems like a distinction without a difference, unless there actually are any good metrics (which also requires them to be objectively and reliably quantifiable). I think most developers don't really want to measure themselves, it's just that pro-AI people think measurement is necessary to put forward a convincing argument that they've improved anything. |
| |
| ▲ | sodapopcan 13 hours ago | parent [-] | | The only time metrics have been useful to me in the past is when they are kept private to each team, which is to say that I do think they are useful for measuring yourself, but not for others to measure you. Taken over time, they can eventual give you a really good idea of what you can deliver. Sandbag a bit (ie, undershoot that number), communicate that to ye olde stakeholders, and everybody's happy that you can actually do what you say you'll do without being stressed out (obviously this doesn't work in startups). |
|
|
| ▲ | sailfast 9 hours ago | parent | prev | next [-] |
| It’s not meaningless - it just shouldn’t be held up as the only thing. Sometimes having a couple proxies is Ok as long as you also look at value in other ways. /shrug |
|
| ▲ | scuff3d 12 hours ago | parent | prev | next [-] |
| Multiple agents in parallel "working on different features" is where people lose me. I don't care how much friction you've eliminated from the loop, eventually that code has to be looked at. Trying to switch between 5 different feature branches and properly review the code, even with AI help, if done properly is going to eat up most of not all the productivity improvements. The only way around it is to start pencils whipping reviews. |
| |
| ▲ | renegade-otter 3 hours ago | parent [-] | | Yes. Your brain, your clear thinking and your focus are the ultimate scarce resource. Writing code is easy, but I review one large PR from a coworker, and I need a nap. Claiming that you have "ten agents writing code at night" is not the flex you think it is. That's just a recipe for burnout and bad design decisions. Stop running your agents and go touch grass. |
|
|
| ▲ | Sabu87 12 hours ago | parent | prev | next [-] |
| I'm also trying everything to learn how to use Claude, everything is so new. And keep upgrading. |
|
| ▲ | Rover222 10 hours ago | parent | prev | next [-] |
| Of course lines of code is a meaningful metric. It's not like the author said it's the ONLY meaningful metric. |
|
| ▲ | groby_b 12 hours ago | parent | prev [-] |
| Here's the thing every discussion around this tries to weasel around: All else being equal, yes, more PRs is a signal of productivity. It's not the only metric. But I'm more and more convinced that the people protesting any discussion of it are the ones who... don't ship a lot. Of course it matters in what code base. What size PR. How many bugs. Maintenance burden. Complexity. All of that doesn't go away. But that doesn't disqualify the metric, it just points out it's not a one-dimensional problem. And for a solo project, it's fairly easy to hold most of these variables relatively constant. Which means "volume went up" is a pretty meaningful signal in that context. |
| |
| ▲ | anukin 11 hours ago | parent | next [-] | | Can you define what “all else” means here? PRs or closed jira tickets can be a metric of productivity only if they add or improve the existing feature set of the product. If a PR introduces a feature with 10 bugs in other features and I have my agent swarm fix those in 10-20 PRs in a week, my productivity and delivery have both taken a hit. If any of these features went to prod, I have lost revenue as well. Shipping is not same as shipping correctly with minimal introduction of bugs. | |
| ▲ | mememememememo 8 hours ago | parent | prev | next [-] | | Number of integration tests might be a good metric (until you announce that it is the metric then like every other metric, inc. profit, it becomes useless!) For profit failing as a metric, see: Enron. | |
| ▲ | sarchertech 11 hours ago | parent | prev | next [-] | | > All else being equal, yes, more PRs is a signal of productivity. Yeah but all else isn’t equal, so unless you’re measuring a whole lot more than PRs it’s completely meaningless. Even on a solo project, something as simple as I’m working with a new technology that I’m excited about is enough to drastically ramp up number of PRs. | |
| ▲ | SpicyLemonZest 7 hours ago | parent | prev [-] | | The problem is that these caveats, while tolerable in some contexts, make the metric impossible to interpret for something like Claude Code which is (I agree!) a huge change in how most software is developed. If you mostly get around on your feet, distance traveled in a day is a reasonable metric for how much exercise you got. It's true that it also matters how you walk and where you walk, but it would be pretty tedious to tell someone that a "3 mile run" is meaningless and they must track cardiovascular health directly. It's fine, it works OK for most purposes, not every metric has to be perfect. But once you buy a car, the metric completely decouples, and no longer points towards your original fitness goals even a tiny bit. It's not that cars are useless, or that driving has a magic slowdown factor that just so happens to compensate for your increased distance travelled. The distance just doesn't have anything to do with the exercise except by a contingent link that's been broken. |
|