| ▲ | devin 4 hours ago |
| > If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t. It is so embarrassing that LOC is being used as a metric for engineering output. |
|
| ▲ | ilikebits 4 hours ago | parent | next [-] |
| LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000. |
| |
| ▲ | jazzypants 4 hours ago | parent | next [-] | | That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning. | | |
| ▲ | mcmcmc 4 hours ago | parent [-] | | Ok? No one is saying that all LOC are equal. Ceteris paribus, 2000 lines is 10x more time consuming to review than 200 | | |
| ▲ | jazzypants 3 hours ago | parent | next [-] | | The point is that LOC is never a good metric for any aspect of determining the quality of code or the coder because it ignores the nuance of reality. It's impossible to generalize because the code can be either deceptively dense or unnecessarily bloated. The only thing that actually matters is whether the business objective is achieved without any unintended side effects. | | |
| ▲ | mcmcmc 3 hours ago | parent [-] | | > The only thing that actually matters is whether the business objective is achieved without any unintended side effects. Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. Deceptively dense is what I’d call software engineers who can’t accept that the process is actually generalizable to a degree and that lines of code are one of the few tangible things that can be used as a metric. Can you deliver value without lines of code? | | |
| ▲ | jazzypants 2 hours ago | parent [-] | | > Objectives change; timeliness matters. The speed at which you deliver value is incredibly important, which is why it matters to measure your process. This assumes that shorter code is faster to write. To quote Blaise Pascal, "I would have written a shorter letter, but I did not have the time." > Can you deliver value without lines of code? No, but you can also depreciate value when you stuff a codebase full of bloated, bug-ridden code that no man or machine can hope to understand. | | |
| ▲ | mcmcmc 2 hours ago | parent [-] | | You seem determined to misinterpret. I’m not talking about LOC as a measure of productivity. The ratio of LOC needing review to the capacity of reviewers (using how many LOC can be read/reviewed over the sampling period) is what’s being discussed. Agentic AI/vibe coding has caused that ratio to increase and shows a bottleneck in the SDLC. It’s a proxy metric, get over yourself. “All models are wrong, some are useful”. What’s not useful is constantly bitching about how there’s no way to measure your work outside of the binary “is it done” every time process efficiency is brought up. | | |
| ▲ | jazzypants an hour ago | parent [-] | | Yes, reading this back, I definitely veered off-topic. I apologize. I still don't think that you can say how much time it will take to review code based on how many lines of code are involved, but my argument was not well crafted. I just hope that others can learn something from our discussion. Thank you for being patient with me, and I hope you have a good day! :) |
|
|
|
| |
| ▲ | 3 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | embedding-shape 3 hours ago | parent | prev [-] | | > 2000 lines is 10x more time consuming to review than 200 Very far from the truth in practice, every line of code isn't as difficult/easy to review as the other. | | |
| ▲ | jimbokun 2 hours ago | parent | next [-] | | But why would the lines in the 2000 case be easier to review per line? | |
| ▲ | 2 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | mcmcmc 2 hours ago | parent | prev [-] | | Holy shit, read the words I wrote. Ceteris Paribus. Assume the 200 lines and 2000 lines have a similar distribution of complexity. |
|
|
| |
| ▲ | moregrist 3 hours ago | parent | prev | next [-] | | It’s still a bad metric. I have worked with code where 1000s of lines are very straightforward and linear. I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack. The skills and effort required to review and understand those situations are quite different. One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning. | | |
| ▲ | lelandfe 3 hours ago | parent | next [-] | | There’s still a limit on how far one can drive in a day, no matter the road. | |
| ▲ | jimbokun 2 hours ago | parent | prev | next [-] | | The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs. So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code. Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw. | |
| ▲ | 3 hours ago | parent | prev [-] | | [deleted] |
| |
| ▲ | mrbnprck 3 hours ago | parent | prev [-] | | Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome. |
|
|
| ▲ | faizshah 3 hours ago | parent | prev | next [-] |
| I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc. I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC. The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build. |
| |
| ▲ | embedding-shape 3 hours ago | parent [-] | | Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same. My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write. | | |
| ▲ | 8note 2 hours ago | parent [-] | | or go with this, and use the agent to prototype ideas, and write it yourself once you know what you want |
|
|
|
| ▲ | keeda 2 hours ago | parent | prev | next [-] |
| LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such. It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications. As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason. So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact. (As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.) |
|
| ▲ | root_axis 3 hours ago | parent | prev | next [-] |
| He's not using LOC as a metric, he's making an observation about the impact of a change in the typical volume of LOC. |
|
| ▲ | mcmcmc 4 hours ago | parent | prev | next [-] |
| Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line. |
|
| ▲ | adtac 4 hours ago | parent | prev | next [-] |
| LOC is the worst metric for engineering output, except for all the others - Churchill |
| |
| ▲ | deadbabe 4 hours ago | parent [-] | | The amount of times an engineer says what the fuck while reading code still seems like a reliable metric for code quality assessment. | | |
|
|
| ▲ | etothet 4 hours ago | parent | prev | next [-] |
| Agreed. And, LOC has historically been one of the things we've collectively fought against management for how to evalute a "productive" developer! |
| |
| ▲ | ButyTh0 3 hours ago | parent [-] | | Why? We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt! If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning You all think there's more to this than performative barter for coin to spend on food/shelter. | | |
| ▲ | embedding-shape 3 hours ago | parent [-] | | Because not everyone is just out after earning the most money, some people also want to enjoy the workplace where they work. Personally, what the quality of the codebase and infrastructure is in matters a lot for how much you enjoy working in it, and I'd much rather work in a codebase I enjoy and earn half, than a codebase made by just jerking out as many LOC as possible and earn double. Although this requires you to take pride in your profession and what you do. | | |
| ▲ | ButyTh0 2 hours ago | parent [-] | | All of human agency must prop up the vanity of you. Of all people. Got it. ...ok fine; lack of political action to put us all on the hook for your healthcare is your choice to take a gamble on a paycheck. It's a choice to say your own existence is not owed the assurance of healthcare. So I will honor your choice and not care you exist. |
|
|
|
|
| ▲ | hungryhobbit 3 hours ago | parent | prev | next [-] |
| Humans are also incredibly varied and different. Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal? |
|
| ▲ | vrganj 3 hours ago | parent | prev | next [-] |
| I read somewhere that measuring software engineering output by LoC is like measuring aerospace engineering by pounds added to the plane and I thought that was an apt comparison. |
|
| ▲ | dyauspitr 2 hours ago | parent | prev | next [-] |
| Honestly it’s more like 200 to a 100,000 of pretty decent quality code at this point. |
|
| ▲ | estimator7292 4 hours ago | parent | prev | next [-] |
| At least "mentions of LOC" is now a great metric for "how clueless is this person" |
|
| ▲ | kashyapc 4 hours ago | parent | prev [-] |
| Totally. I thought Simon was wiser than this; even he couldn't resist getting swept up by breathless hype. The moment you start typing "LOC as a metric", alarm bells should go off in your head. |
| |
| ▲ | simonw 2 hours ago | parent | next [-] | | This was a podcast, not a pre-scripted talk. I suggest listening to the audio version - it makes it more clear that this was thinking out loud, not carefully considering every word. | | |
| ▲ | kashyapc an hour ago | parent [-] | | I see, fair point. Sorry for taking a dig at you. Please know that I do appreciate a lot of work that you do. I was just worried for a moment when just reading that bit. |
| |
| ▲ | Daishiman 3 hours ago | parent | prev [-] | | LOC is very much an effective metric for general productivity for the median feature. You can't code golf most lines of code out of existence. We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered. |
|