| ▲ | multisport 2 days ago |
| What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects. If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. No doubt I could give Opus 4.5 "build be a XYZ app" and it will do well. But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". Any non-technical person might read that and go "if it works it works" but any reasonable engineer will know that thats not enough. |
|
| ▲ | redhale 2 days ago | parent | next [-] |
| Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds. Starting back in 2022/2023: - (~2022) It can auto-complete one line, but it can't write a full function. - (~2023) Ok, it can write a full function, but it can't write a full feature. - (~2024) Ok, it can write a full feature, but it can't write a simple application. - (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product. - (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term. It's pretty clear to me where this is going. The only question is how long it takes to get there. |
| |
| ▲ | arkensaw 2 days ago | parent | next [-] | | > It's pretty clear to me where this is going. The only question is how long it takes to get there. I don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason. | | |
| ▲ | namrog84 2 days ago | parent [-] | | While I do agree with you. To play the counterpoint advocate though. What if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase? It is probably incredibly wasteful, but ignoring that, could it work? | | |
| ▲ | techblueberry 14 hours ago | parent | next [-] | | I don't think so. I don't think this is how human brains work, and you would have too many problems trying to balance things out. I'm thinking specifically like a complex distributed system. There are a lot of tweaks and iterations you need for things to work with eachother. But then maybe this means what is a "codebase". If a code base is just a structured set of specs that compile to code ala typescript -> javascript. sure, but then, it's still a long-lived <blank> But maybe you would have to elaborate on, what does "creating software on the fly" look like,. because I'm sure there's a definition where the answer is yes. | |
| ▲ | fwip a day ago | parent | prev | next [-] | | That sounds like an insane way to do anything that matters. Sure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email? The only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters. | | |
| ▲ | theshrike79 a day ago | parent [-] | | In Japan buildings (apartments) aren't built to last forever. They are built with a specific age in mind. They acknowledge the fact that houses are depreciating assets which have a value lim->0. The only reason we don't do that with code (or didn't use to do it) was because rewriting from scratch NEVER worked[0]. And large scale refactors take massive amounts of time and resources, so much so that there are whole books written about how to do it. But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec". [0] https://www.joelonsoftware.com/2000/04/06/things-you-should-... [1] https://simonwillison.net/2025/Dec/15/porting-justhtml/ | | |
| ▲ | techblueberry 14 hours ago | parent | next [-] | | > But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec". This seems like a sort of I dunno chicken and the egg thing. The _reason_ you don't rewrite code is because it's hard to know that you truly understand the spec. If you could perfectly understand the spec then you could rewrite the code, but then what is the software, is it the code or the spec that writes the code. So if you built code A from spec, rebuilding it from spec I don't think qualifies a rewrite, it's just a recompile. If you're trying to fundamentally build a new application from spec when an old application was written by hand, you're going to run into the same problems you have in a normal rewrite. We already have an example of this. Typescript applications are basically rewritten every time that you recompile typescript to node. Typescript isn't the executed code, it's a spec. edit: I think I missed that you said rewrite in a different language, then yeah fine, you're probably right, but I don't think most people are architecture agnostic when they talk about rewrites. The point of a rewrite is to keep the good stuff and lose a lot of bad stuff. If you're using the original app as a spec to rewrite in a new language, then fine yeah, LLM's may be able to do this relatively trivially. | |
| ▲ | fwip a day ago | parent | prev [-] | | Sure, and the buildings are built to a slowly-evolving code, using standard construction techniques, operating as a predictable building in a larger ecosystem. The problem with "all software" being AI-generated is that, to use your analogy, the electrical standards, foundation, and building materials have all been recently vibe-coded into existence, and none of your construction workers are certified in any of it. |
|
| |
| ▲ | damethos a day ago | parent | prev [-] | | I have the same questions in my head lately. |
|
| |
| ▲ | bayindirh 2 days ago | parent | prev | next [-] | | Well, the first 90% is easy, the hard part is the second 90%. Case in point: Self driving cars. Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders. | | |
| ▲ | throwthrowuknow 2 days ago | parent | next [-] | | Even if Opus 4.5 is the limit it’s still a massively useful tool. I don’t believe it’s the limit though for the simple fact that a lot could be done by creating more specialized models for each subdomain i.e. they’ve focused mostly on web based development but could do the same for any other paradigm. | | |
| ▲ | emodendroket 2 days ago | parent | next [-] | | That's a massive shift in the claim though... I don't think anyone is disputing that it's a useful tool; just the implication that because it's a useful tool and has seen rapid improvement that implies they're going to "get all the way there," so to speak. | |
| ▲ | bayindirh 2 days ago | parent | prev [-] | | Personally I'm not against LLMs or AI itself, but considering how these models are built and trained, I personally refuse to use tools built on others' work without or against their consent (esp. GPL/LGPL/AGPL, Non Commercial / No Derivatives CC licenses and Source Available licenses). Of course the tech will be useful and ethical if these problems are solved or decided to be solved the right way. | | |
| ▲ | ForHackernews 2 days ago | parent [-] | | We just need to tax the hell out of the AI companies (assuming they are ever profitable) since all their gains are built on plundering the collective wisdom of humanity. | | |
| ▲ | thfuran 2 days ago | parent | next [-] | | I don’t think waiting for profitability makes sense. They can be massively disruptive without much profit as long as they spend enough money. | |
| ▲ | encyclopedism 2 days ago | parent | prev [-] | | AI companies and corporations in general control your politicians so taxing isn't going to happen. |
|
|
| |
| ▲ | literalAardvark 2 days ago | parent | prev | next [-] | | They're not blenders. This is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it. You just lose knowledge, which can be provided externally, and which is the actual "pirated" part. The logic is _learned_ | | |
| ▲ | encyclopedism 2 days ago | parent | next [-] | | It hasn't learned any LOGIC. It has 'learned' patterns from the input. | | | |
| ▲ | bayindirh 2 days ago | parent | prev [-] | | Are there any recent publications about it so I can refresh myself on the matter? | | |
| ▲ | D-Machine 2 days ago | parent [-] | | You won't find any trustworthy papers on the topic because GP is simply wrong here. That models can be distilled has no bearing whatsoever on whether a model has learned actual knowledge or understanding ("logic"). Models have always learned sparse/approximately-sparse and/or redundant weights, but they are still all doing manifold-fitting. The resulting embeddings from such fitting reflect semantics and semantic patterns. For LLMs trained on the internet, the semantic patterns learned are linguistic, which are not just strictly logical, but also reflect emotional, connotational, conventional, and frequent patterns, all of which can be illogical or just wrong. While linguistic semantic patterns are correlated with logical patterns in some cases, this is simply not true in general. |
|
| |
| ▲ | 2 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | mcfedr 2 days ago | parent | prev | next [-] | | i like to think of LLMs as random number generators with a filter | |
| ▲ | rat9988 2 days ago | parent | prev [-] | | > Well, the first 90% is easy, the hard part is the second 90%. You'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth. | | |
| ▲ | bayindirh 2 days ago | parent | next [-] | | No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline. Knowledge engineering has a notion called "covered/invisible knowledge" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good. Considering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person. | | |
| ▲ | damethos a day ago | parent | next [-] | | "covered/invisible knowledge" aka tacit knowledge | | |
| ▲ | bayindirh a day ago | parent [-] | | Yeah, I failed to remember the term while writing the comment. Thanks! |
| |
| ▲ | rat9988 2 days ago | parent | prev | next [-] | | Self driving cars is not a proof. It only proves that having quick gains doesn't mean necessarily you'll get a 100% fast. It doesn't prove it will necessarily happen. | |
| ▲ | enraged_camel 2 days ago | parent | prev | next [-] | | >> No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid. >> The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline. Sure, but the question is not "how long does it take for LLMs to get to 100%". The question is, how long does it take for them to become as good as, or better than, humans. And that threshold happens way before 100%. | | |
| ▲ | bayindirh 2 days ago | parent [-] | | >> Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid. Doesn't matter, because if we're talking about AI models, no (type of) model reaches 100% linearly, or 100% ever. For example, recognition models run with probabilities. Like Tesla's Autopilot (TM), which loves to hit rolled-over vehicles because it has not seen enough vehicle underbodies to classify it. Same for scientific classification models. They emit probabilities, not certain results. >> Sure, but the question is not "how long does it take for LLMs to get to 100%" I never claimed that a model needs to reach a proverbial 100%. >> The question is, how long does it take for them to become as good as, or better than, humans. They can be better than humans for certain tasks. They are actually better than humans in some tasks since 70s, but we like to disregard them to romanticize current improvements, but I don't believe current or any generation of AIs can be better than humans in anything and everything, at once. Remember: No machine can construct something more complex than itself. >> And that threshold happens way before 100%. Yes, and I consider that "treshold" as "complete", if they can ever reach it for certain tasks, not "any" task. |
| |
| ▲ | thfuran 2 days ago | parent | prev [-] | | >None of the models (even AI in general) can capture this None of the current models maybe, but not AI in general? There’s nothing magical about brains. In fact, they’re pretty shit in many ways. | | |
| ▲ | bayindirh 2 days ago | parent | next [-] | | A model trained on a very large corpus can't, because these behaviors are different or specialized enough they cancel each other most of the cases. You can forcefully fine-tune a model with a singular person's behavior up to a certain point, but I'm not sure that even that can capture the subtlest of behaviors or decision mechanisms which are generally the most important ones (the ones we call gut feeling or instinct). OTOH, while I won't call human brain perfect, the things we label "shit" generally turn out to be very clever and useful optimizations to workaround its own limitations, so I regard human brain higher than most AI proponents do. Also we shouldn't forget that we don't know much about how that thing works. We only guess and try to model it. Lastly, searching perfection in numbers and charts or in engineering sense is misunderstanding nature and doing a great disservice to it, but this is a subject for another day. | |
| ▲ | emodendroket 2 days ago | parent | prev [-] | | The understanding of the brain is far from complete whether they're "magical" or "shit." | | |
|
| |
| ▲ | sanderjd 2 days ago | parent | prev | next [-] | | I read the comment more as "based on past experience, it is usually the case that the first 90% is easier than the last 10%", which is the right base case expectation, I think. That doesn't mean it will definitely play out that way, but you don't have to "prove" things like this. You can just say that they tend to be true, so it's a good expectation to think it will probably be true again. | |
| ▲ | rybosworld 2 days ago | parent | prev [-] | | The saying is more or less treated as a truism at this point. OP isn't claiming something original and the onus of proving it isn't on them imo. I've heard this same thing repeated dozens of times, and for different domains/industries. It's really just a variation of the 80/20 rule. |
|
| |
| ▲ | PunchyHamster 2 days ago | parent | prev | next [-] | | Note that blog posts rarely show the 20 other times it failed to build something and only that time that it happened to work. We've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years | | |
| ▲ | redhale a day ago | parent | next [-] | | I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation). As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better. And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around. | |
| ▲ | theshrike79 a day ago | parent | prev [-] | | With "art" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM. Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really. If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway. It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it. |
| |
| ▲ | Scea91 2 days ago | parent | prev | next [-] | | > - (~2023) Ok, it can write a full function, but it can't write a full feature. The trend is definitely here, but even today, heavily depends on the feature. While extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product. | |
| ▲ | sanderjd 2 days ago | parent | prev | next [-] | | Yeah maybe, but personally it feels more like a plateau to me than an exponential takeoff, at the moment. And this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time! But I'm still pretty skeptical of "these things are about to not require human operators in the loop at all!". | | |
| ▲ | throwthrowuknow 2 days ago | parent [-] | | I can agree that it doesn’t seem exponential yet but this is at least linear progression not a plateau. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | Linear progression feels slower (and thus more like a plateau) to me than the end of 2022 through end of 2024 period. The question in my mind is where we are on the s-curve. Are we just now entering hyper-growth? Or are we starting to level out toward maturity? It seems like it must still be hyper-growth, but it feels less that way to me than it did a year ago. I think in large part my sense is that there are two curves happening simultaneously, but at different rates. There is the growth in capabilities, and then there is the growth in adoption. I think it's the first curve that seems to be to have slown a bit. Model improvements seem both amazing and also less revolutionary to me than they did a year or two ago. But the other curve is adoption, and I think that one is way further from maturity. The providers are focusing more on the tooling now that the models are good enough. I'm seeing "normies" (that is, non-programmers) starting to realize the power of Claude Code in their own workflows. I think that's gonna be huge and is just getting started. |
|
| |
| ▲ | EthanHeilman 2 days ago | parent | prev | next [-] | | I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet. > The only question is how long it takes to get there. This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations. It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom. | |
| ▲ | kubb 2 days ago | parent | prev | next [-] | | Each of these years we’ve had a claim that it’s about to replace all engineers. By your logic, does it mean that engineers will never get replaced? | |
| ▲ | HarHarVeryFunny 2 days ago | parent | prev | next [-] | | Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help. I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool. | |
| ▲ | ugurs 2 days ago | parent | prev | next [-] | | Ok, it can create a long-lived complex codebase for a product that is extensible and scalable over the long term, but it doesn't have cool tattoos and can't fancy a matcha | |
| ▲ | mjr00 2 days ago | parent | prev | next [-] | | This is disingenuous because LLMs were already writing full, simple applications in 2023.[0] They're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry. Plus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves. [0] eg https://www.youtube.com/watch?v=GizsSo-EevA | | |
| ▲ | blitz_skull 2 days ago | parent [-] | | What LLM were you using to build full applications in 2023? That certainly wasn’t my experience. | | |
| ▲ | mjr00 2 days ago | parent [-] | | Just from googling, here's a video "Use ChatGPT to Code a Full Stack App" from May 18, 2023.[0] There's a lot of non-ergonomic copy and pasting but it's definitely using an LLM to build a full application. [0] https://www.youtube.com/watch?v=GizsSo-EevA | | |
| ▲ | blitz_skull 2 days ago | parent [-] | | That's not at all what's being discussed in this article. We copy-pasted from SO before this. This article is talking about 99% fully autonomous coding with agents, not copy-pasting 400 times from a chat bot. | | |
| ▲ | mjr00 2 days ago | parent [-] | | Hi, please re-read the parent comment again, which was claiming > Starting back in 2022/2023: > - (~2022) It can auto-complete one line, but it can't write a full function. > - (~2023) Ok, it can write a full function, but it can't write a full feature. This was a direct refutation, with evidence, that in 2023 people were not claiming that LLMs "can't write a full feature", because, as demonstrated, people were already building full applications with it at the time. This obviously is not talking exclusively about agents, because agents did not exist in 2022. | | |
| ▲ | redhale a day ago | parent [-] | | I get your point, but I'll just say that I did not intend my comment to be interpreted so literally. Also, just because SOMEONE planted a flag in 2023 saying that an LLM could build an app certainly does NOT mean that "people were not claiming that LLMs "can't write a full feature"". People in this very thread are still claiming LLMs can't write features. Opinions vary. |
|
|
|
|
| |
| ▲ | itsthecourier 2 days ago | parent | prev [-] | | I use it on a 10 years codebase, needs to explain where to get context but successfully works 90% of time |
|
|
| ▲ | FloorEgg 2 days ago | parent | prev | next [-] |
| There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things. I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case. I remember cases where a team of engineers built something the "right" way but it turned out to be the wrong thing. (Well engineered thing no one ever used) Sometimes hacking something together messily to confirm it's the right thing to be building is the right way. Then making sure it's secure, then finally paying down some technical debt to make it more maintainable and extensible. Where I see real silly problems is when engineers over-engineer from the start before it's clear they are building the right thing, or when management never lets them clean up the code base to make it maintainable or extensible when it's clear it is the right thing. There's always a balance/tension, but it's when things go too far one way or another that I see avoidable failures. |
| |
| ▲ | ozim 2 days ago | parent | next [-] | | *I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case.* Gosh I am so tired with that one - someone had a case that burned them in some previous project and now his life mission is to prevent that from happening ever again, and there would be no argument they will take. Then you get like up to 10 engineers on typical team and team rotation and you end up with all kinds of "we have to do it right because we had to pull all nighter once, 5 years ago" baked in the system. Not fun part is a lot of business/management people "expect" having perfect solution right away - there are some reasonable ones that understand you need some iteration. | | |
| ▲ | mrheosuper 2 days ago | parent [-] | | >someone had a case that burned them in some previous project and now his life mission is to prevent that from happening ever again Isn't that what makes them senior ? If you dont want that behaviour, just hire a bunch of fresh grad. | | |
| ▲ | lukan 2 days ago | parent | next [-] | | No, extrapolating from one bad experience to universal approach does not make anyone senior. There are situations where it applies and situation where it doesn't. Having the experience to see what applies in this new context is what senior (usually) means. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | The people I admire most talk a lot more about "risk" than about "right vs. wrong". You can do that thing that caused that all-nighter 5 years ago, it isn't "wrong", but it is risky, and the person who pulled that all-nighter has useful information about that risk. It often makes sense to accept risks, but it's always good to be aware that you're doing so. | | |
| ▲ | yurishimo 2 days ago | parent [-] | | It's also important to consider the developers risk tolerance as well. It's all fine and dandy that the project manager is okay with the risk but what if none of the developers are? Or one senior dev is okay with it but the 3 who actually work the on-call queue are not? I don't get paid extra for after hours incidents (usually we just trade time), so it's well within my purview on when to take on extra risk. Obviously, this is not ideal, but I don't make the on-call rules and my ability to change them is not a factor. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | I don't think of this as a project manager's role, but an engineering manager's role. The engineers on the team (especially the senior engineers) should be identifying the risks, and the engineering managers should be deciding whether they are tolerable. That includes risks like "the oncall is awful and morale collapses and everyone quits". It's certainly the case that there are managers who handle those risks poorly, but that's just bad management. |
|
|
| |
| ▲ | ozim 2 days ago | parent | prev [-] | | Nope, not realizing something doesn't apply and not being able to take in arguments is cargo culting not being a senior. |
|
| |
| ▲ | yourapostasy 2 days ago | parent | prev | next [-] | | > ...multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. I usually resolve this by putting on the table the consequences and their impacts upon my team that I’m concerned about, and my proposed mitigation for those impacts. The mitigation always involves the other proposer’s team picking up the impact remediation. In writing. In the SOP’s. Calling out the design decision by day of the decision to jog memories and names of those present that wanted the design as the SME’s. Registered with the operations center. With automated monitoring and notification code we’re happy to offer. Once people are asked to put accountable skin in the sustaining operations, we find out real fast who is taking into consideration the full spectrum end to end consequences of their decisions. And we find out the real tradeoffs people are making, and the externalities they’re hoping to unload or maybe don’t even perceive. | | |
| ▲ | gleenn 2 days ago | parent [-] | | That's awesome, but I feel like half the time most people aren't in the position to add requirements so a lot of shenanigans still happens, especially in big corps |
| |
| ▲ | kalaksi 2 days ago | parent | prev | next [-] | | > I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case. My first thought was that you probably also have different biases, priorities and/or taste. As always, this is probably very context-specific and requires judgement to know when something goes too far. It's difficult to know the "most correct" approach beforehand. > Sometimes hacking something together messily to confirm it's the right thing to be building is the right way. Then making sure it's secure, then finally paying down some technical debt to make it more maintainable and extensible. I agree that sometimes it is, but in other cases my experience has been that when something is done, works and is used by customers, it's very hard to argue about refactoring it. Management doesn't want to waste hours on it (who pays for it?) and doesn't want to risk breaking stuff (or changing APIs) when it works. It's all reasonable. And when some time passes, the related intricacies, bigger picture and initially floated ideas fade from memory. Now other stuff may depend on the existing implementation. People get used to the way things are done. It gets harder and harder to refactor things. Again, this probably depends a lot on a project and what kind of software we're talking about. > There's always a balance/tension, but it's when things go too far one way or another that I see avoidable failures. I think balance/tension describes it well and good results probably require input from different people and from different angles. | |
| ▲ | Ericson2314 2 days ago | parent | prev [-] | | I know what you are talking about, but there is more to life than just product-market fit. Hardly any of us are working on Postgres, Photoshop, blender, etc. but it's not just cope to wish we were. It's good to think about the needs to business and the needs of society separately. Yes, the thing needs users, or no one is benefiting. But it also needs to do good for those users, and ultimately, at the highest caliber, craftsmanship starts to matter again. There are legitimate reasons for the startup ecosystem to focus firstly and primarily on getting the users/customers. I'm not arguing against that. What I am arguing is why does the industry need to be dominated by startups in terms of the bulk of the products (not bulk of the users). It begs the question of how much societally-meaningful programming waiting to be done. I'm hoping for a world where more end users code (vibe or otherwise) and the solve their own problems with their own software. I think that will make more a smaller, more elite software industry that is more focused on infrastructure than last-mile value capture. The question is how to fund the infrastructure. I don't know except for the most elite projects, which is not good enough for the industry (even this hypothetical smaller one) on the whole. | | |
| ▲ | sanderjd 2 days ago | parent | next [-] | | > I'm hoping for a world where more end users code (vibe or otherwise) and the solve their own problems with their own software. I think that will make more a smaller, more elite software industry that is more focused on infrastructure than last-mile value capture. Yes! This is what I'm excited about as well. Though I'm genuinely ambivalent about what I want my role to be. Sometimes I'm excited about figuring out how I can work on the infrastructure side. That would be more similar to what I've done in my career thus far. But a lot of the time, I think that what I'd prefer would be to become one of those end users with my own domain-specific problems in some niche that I'm building my own software to help myself with. That sounds pretty great! But it might be a pretty unnatural or even painful change for a lot of us who have been focused for so long on building software tools for other people to use. | |
| ▲ | swat535 2 days ago | parent | prev | next [-] | | Users will not care about the quality of your code, or the backed architecture, or your perfectly strongly typed language. They only care about their problems and treat their computers like an appliance. They don't care if it takes 10 seconds or 20 seconds. They don't even care if it has ads, popups, and junk.
They are used to bloatware and will gladly open their wallets if the tool is helping them get by. It's an unfortunately reality but there it is, software is about money and solving problems. Unless you are working on a mission critical system that affects people's health or financial data, none of those matter much. | | |
| ▲ | Ericson2314 2 days ago | parent [-] | | I know the customer's couldn't care about the quality of the code they see. But the idea that they don't care about software being bad/laggy/bloated ever, because it "still solves problems", doesn't stand up to scrutiny as an immutable fact of the universe. Market conditions can change. I'm banking on a future that if users feel they can (perhaps vibe) code their own solutions, they are far less likely to open their wallets for our bloatware solutions. Why pay exorbitant rents for shitty SaaS if you can make your own thing ad-free, exactly to your own mental spec? I want the "computers are new, programmers are in short supply, customer is desperate" era we've had in my lifetime so far to come to a close. |
| |
| ▲ | saxenaabhi 2 days ago | parent | prev [-] | | > There are legitimate reasons for the startup ecosystem to focus firstly and primarily on getting the users/customers. I'm not arguing against that. What I am arguing is why does the industry need to be dominated by startups in terms of the bulk of the products (not bulk of the users). It begs the question of how much societally-meaningful programming waiting to be done. You slipped in "societally-meaningful" and I don't know what it means and don't want to debate merits/demerits of socialism/capitalism. However I think lots of software needs to be written because in my estimation with AI/LLM/ML it'll generate value. And then you have lots of software that needs to rewritten as firms/technologies die and new firms/technologies are born. | | |
| ▲ | Ericson2314 2 days ago | parent [-] | | I didn't mean to do some snide anticaptialism. Making new Postgreses and blenders is really hard. I don't think the startup ecosystem does a very good job, but I don't assume central planning would do a much better job either. (The method I have the most confidence in is some sort of mixed system where there is non-profit, state-planned, and startup software development all at once.) Markets are a tool, a means to the end. I think they're very good, I'm a big fan! But they are not an excuse not to think about the outcome we want. I'm confident that the outcome I don't want is where most software developers are trying to find demand for their work, pivoting etc. it's very "pushing a string" or "cart before the horse". I want more "pull" where the users/benefiaries of software are better able to dictate or create themselves what they want, rather than being helpless until a pivoting engineer finds it for them. Basically start-up culture has combined theories of exogenous growth from technology change, and a baseline assumption that most people are and will remain hopelessly computer illiterate, into an ideology that assumes the best software is always "surprising", a paradigm shift, etc. Startups that make libraries/tools for other software developers are fortunately a good step in undermining these "the customer is an idiot and the product will be better than they expect" assumptions. That gives me hope we're reach a healthier mix of push and pull. Wild successes are always disruptive, but that shouldn't mean that the only success is wild, or trying to "act disruptive before wild success" ("manifest" paradigm shifts!) is always the best means to get there. | | |
| ▲ | bigfudge 2 days ago | parent [-] | | I've worked in various roles, and I'm one of those people who is not computer illiterate and likes to build solutions that meet local needs. It's got a lot easier technically to do that in recent year, and MUCH easier with AI. But institutionally and in terms of governance it's got a lot harder. Nobody wants home-brew software anymore. Doing data management and governance is complex enough and involves enough different people that it's really hard to generate the momentum to get projects off the ground. I still think it's often the right solution and that successful orgs will go this route and retain people with the skills to make it happen. But the majority probably can't afford the time/complexity, and AI is only part of the balance that determines whether it's feasible. |
|
|
|
|
|
| ▲ | fenwick67 2 days ago | parent | prev | next [-] |
| Another thing that gets me with projects like this, there are already many examples of image converters, minesweeper clones etc that you can just fork on GitHub, the value of the LLM here is largely just stripping the copyright off |
| |
| ▲ | sksishbs 2 days ago | parent | next [-] | | It’s kind of funny - there’s another thread up where a dev claimed a 20-50x speed up. To their credit they posted videos and links to the repo of their work. And when you check the work, a large portion of it was hand rolling an ORM (via an LLM). Relatively solved problem that an LLM would excel at, but also not meaningfully moving the needle when you could use an existing library. And likely just creating more debt down the road. | | |
| ▲ | yourapostasy 2 days ago | parent | next [-] | | Reminds me of a post I read a few days ago of someone crowing about an LLM writing for them an email format validator. They did not have the LLM code up an accompanying send-an-email-validation loop, and were blithely kept uninformed by the LLM of the scar tissue built up by experience in the industry on how curiously a deep rabbit hole email validation becomes. If you’ve been around the block and are judicious how you use them, LLM’s are a really amazing productivity boost. For those without that judgement and taste, I’m seeing footguns proliferate and the LLM’s are not warning them when someone steps on the pressure plate that’s about to blow off their foot. I’m hopeful we will this year create better context window-based or recursive guardrails for the coding agents to solve for this. | | |
| ▲ | sanderjd 2 days ago | parent | next [-] | | Yeah I love working with Claude Code, I agree that the new models are amazing, but I spend a decent amount of time saying "wait, why are we writing that from scratch, haven't we written a library for that, or don't we have examples of using a third party library for it?". There is probably some effective way to put this direction into the claude.md, but so far it still seems to do unnecessary reimplementation quite a lot. | |
| ▲ | Eisenstein 2 days ago | parent | prev [-] | | This is a typical problem you see in autodidacts. They will recreate solutions to solved problems, trip over issues that could have been avoided, and generally do all of things you would expect someone to do if they are working with skill but no experience. LLMs accelerate this and make it more visible, but they are not the cause. It is almost always a person trying to solve a problem and just not knowing what they don't know because they are learning as they go. | | |
| ▲ | filoeleven 2 days ago | parent | next [-] | | > [The cause] is almost always a person trying to solve a problem and just not knowing what they don't know because they are learning as they go. Isn't that what "using an LLM" is supposed to solve in the first place? | | |
| ▲ | kaydub a day ago | parent | next [-] | | With the right prompt the LLM will solve it in the first place. But this is an issue of not knowing what you don't know, so it makes it difficult to write the right prompt. One way around this is to spawn more agents with specific tasks, or to have an agent that is ONLY focused on finding patterns/code where you're reinventing the wheel. I often have one agent/prompt where I build things but then I have another agent/prompt where their only job is to find codesmells, bad patterns, outdated libraries, and make issues or fix these problems. | |
| ▲ | Eisenstein 2 days ago | parent | prev | next [-] | | 1. LLMs can't watch over someone and warn them when they are about to make a mistake 2. LLMs are obsequious 3. Even if LLMs have access to a lot of knowledge they are very bad at contextualizing it and applying it practically I'm sure you can think of many other reasons as well. People who are driven to learn new things and to do things are going to use whatever is available to them in order to do it. They are going to get into trouble doing that more often than not, but they aren't going to stop. No is helping the situation by sneering at them -- they are used it to it, anyway. | |
| ▲ | 2 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | yourapostasy 2 days ago | parent | prev | next [-] | | I am hopeful autodidacts will leverage an LLM world like they did with an Internet search world from a library world from a printed word world. Each stage in that progression compressed the time it took for them to encompass a span of comprehension of a new body of understanding before applying to practice, expanded how much they applied the new understanding to, and deepened their adoption scope of best practices instead of reinventing the wheel. In this regard, I see LLM's as a way for us to way more efficiently encode, compress, convey and enable operational practice our combined learned experiences. What will be really exciting is watching what happens as LLM's simultaneously draw from and contribute to those learned experiences as we do; we don't need full AGI to sharply realize massive benefits from just rapidly, recursively enabling a new highly dynamic form of our knowledge sphere that drastically shortens the distance from knowledge to deeply-nuanced praxis. | |
| ▲ | lomase 2 days ago | parent | prev [-] | | My impression is that LLM users are the kind of people that HATED that their questions on StackOverflow got closed because it was duplicated. | | |
| ▲ | abstractcontrol 2 days ago | parent | next [-] | | > My impression is that LLM users are the kind of people that HATED that their questions on StackOverflow got closed because it was duplicated. Lol, who doesn't hate that? | | |
| ▲ | lomase 2 days ago | parent [-] | | I don't know, in 40 years codding I never had to ask a question there. |
| |
| ▲ | sanderjd 2 days ago | parent | prev [-] | | So literally everyone in the world? Yeah, seems right! | | |
| ▲ | lomase 2 days ago | parent [-] | | I would love to see your closed SO questions. But don't worry, those days are over, the LLMs it is never going to push back on your ideas. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | lol, I probably don't have any, actually. If I recall, I would just write comments when my question differed slightly from one already there. But it's definitely the case that being able to go back and forth quickly with an LLM digging into my exact context, rather than dealing with the kind of judgy humorless attitude that was dominant on SO is hugely refreshing and way more productive! |
|
|
|
|
| |
| ▲ | suzzer99 2 days ago | parent | prev | next [-] | | I've hand-rolled my own ultra-light ORM because the off-the-shelf ones always do 100 things you don't need.* And of course the open source ones get abandoned pretty regularly. Type ORM, which a 3rd party vendor used on an app we farmed out to them, mutates/garbles your input array on a multi-line insert. That was a fun one to debug. The issue has been open forever and no one cares. https://github.com/typeorm/typeorm/issues/9058 So yeah, if I ever need an ORM again, I'm probably rolling my own. *(I know you weren't complaining about the idea of rolling your own ORM, I just wanted to vent about Type ORM. Thanks for listening.) | | |
| ▲ | theshrike79 a day ago | parent [-] | | This is the thing that will be changing the open source and small/medium SaaS world a lot. Why use a 3rd party dependency that might have features you don't need when you can write a hyper-specific solution in a day with an LLM and then you control the full codebase. Or why pay €€€ for a SaaS every month when you can replicate the relevant bits yourself? |
| |
| ▲ | patates 2 days ago | parent | prev | next [-] | | It seems to me these days, any code I want to write tries to solve problems that LLMs already excel at. Thankfully my job is perhaps just 10% about coding, and I hope people like you still have some coding tasks that cannot be easily solved by LLMs. We should not exeggarate the capabilities of LLMs, sure, but let's also not play "don't look up". | |
| ▲ | paipa 2 days ago | parent | prev [-] | | "And likely just creating more debt down the road" In the most inflationary era of capabilities we've seen yet, it could be the right move. What's debt when in a matter of months you'll be able to clear it in one shot? |
| |
| ▲ | melagonster 2 days ago | parent | prev | next [-] | | - I cloned a project from GitHub and made some minor modifications. - I used AI-assisted programming to create a project. Even if the content is identical, or if the AI is smart enough to replicate the project by itself, the latter can be included on a CV. | | |
| ▲ | jasonfarnon 2 days ago | parent | next [-] | | I think I would prefer the former if I were reviewing a CV. It at least tells me they understood the code well enough to know where to make their minor tweaks. (I've spent hours reading through a repo to know where to insert/comment out a line to suit my needs.) The second tells me nothing. | | |
| ▲ | dugidugout 2 days ago | parent [-] | | Its odd you don't apply the same analysis to each. The latter certainly can provide a similar trail indicating knowledge of the use case and necessary parameters to achieve it. And certainly the former doesnt preclude llm interlocking. | | |
| ▲ | lomase 2 days ago | parent [-] | | Why do you write like that? | | |
| ▲ | dugidugout a day ago | parent [-] | | It would help if I had a better understanding of what you mean by "that". I generally write to liberate my consciousness from isolation. When doing so in a public forum I am generally doing so in response to an assertion. When responding to an assertion I am generally attempting to understand the framing which produced the assertion. I suppose you may also be speaking to the voice which is emergent. I am not very well read, so you may find my style unconventional or sloppy. I generally try not to labor too much in this regard and hope this will develop as I continue to write. I am receptive to any feedback you have for me. |
|
|
| |
| ▲ | fenwick67 2 days ago | parent | prev | next [-] | | Do people really see a CV and read "computer mommy made me a program" and think it's impressive | | |
| ▲ | melagonster 21 hours ago | parent [-] | | Unfortunately, it is happening. I remember an old post on HNs, it mentioned that a "prompt engineer for article generating" can find more jobs than a columnist writer. And op just wrote articles by himself but declared that all artices were generated by AI. |
| |
| ▲ | infinitezest a day ago | parent | prev | next [-] | | A CV for the disappearing job market as you shovel money into a oligarchy. | |
| ▲ | zwnow 2 days ago | parent | prev [-] | | I'd quickly trash your application if I see you just vibe coded some bullshit app.
Developing is about working smart, and its not smart to ask AI to code stuff that already exists, its in fact wasteful. |
| |
| ▲ | scotty79 2 days ago | parent | prev [-] | | Have you ever tried to find software for a specific need? I usually spend hours investigating anything I can find only to discover that all options are bad in one way or another and cover my use case partially at best. It's dreadful, unrewarding work that I always fear. Being able to spent those hours to develop custom solution that has exactly what I need, no more, no less, that I can evolve further as my requirements evolve, all that while enjoying myself, is a godsend. |
|
|
| ▲ | coffeebeqn 2 days ago | parent | prev | next [-] |
| Anecdata but I’ve found Claude code with Opus 4.5 able to do many of my real tickets in real mid and large codebases at a large public startup. I’m at senior level (15+ years). It can browse and figure out the existing patterns better than some engineers on my team. It used a few rare features in the codebase that even I had forgotten about and was about to duplicate. To me it feels like a real step change from the previous models I’ve used which I found at best useless. It’s following style guides and existing patterns well, not just greenfield. Kind of impressive, kind of scary |
| |
| ▲ | wiz21c 2 days ago | parent | next [-] | | Same anecdote for me (except I'm +/- 40 years experience). I consider my self a pretty good dev for non-web dev (GPU's, assembly, optimisation,...) and my conclusion is the same as you: impressive and scary. If the somehow the idea of what you want to do is on the web in text or in code, then Claude most likely has it. And its ability to understand my own codebases is just crazy (at my age, memory is declining and having Claude to help is just waow). Of course it fails some times, of course it need direction, but the thing it produces is really good. | | |
| ▲ | murukesh_s 2 days ago | parent [-] | | Scary is that the LLM might have been trained on the entire open source code ever produced - which is far beyond human comprehension - and with ever growing capability (bigger context window, more training) my gut feeling is that, it would exceed human capability in programming pretty soon. Considering 2025 was the ground breaking year for agents, can't stop imagine what would happen when it iterates in the next couple of years. I think it would evolve to be like Chess playing engines that consistently beat top Chess players in the world! |
| |
| ▲ | weatherlite 2 days ago | parent | prev | next [-] | | I'm seeing this as well. Not huge codebases but not tiny - 4 year old startup. I'm new there and it would have been impossible for me to deliver any value this soon.
12 years experience; this thing is definitely amazing. Combined with a human it can be phenomenal. It also helped me tons with lots of external tools, understand what data/marketing teams are doing and even providing pretty crucial insights to our leadership that Gemini have noticed.
I wouldn't try to completely automate the humans out of the loop though just yet, but this tech for sure is gonna downsize team numbers (and at the same time - allow many new startups to come to life with little capital that eventually might grow and hire people. So unclear how this is gonna affect jobs.) | |
| ▲ | jarjoura 2 days ago | parent | prev [-] | | I've also found it to keep such a constrained context window (on large codebases), that it writes a secondary block of code that already had a solution in a different area of the same file. Nothing I do seems to fix that in its initial code writing steps. Only after it finishes, when I've asked it to go back and rewrite the changes, this time making only 2 or 3 lines of code, does it magically (or finally) find the other implementation and reuse it. It's freakin incredible at tracing through code and figuring it out. I <3 Opus. However, it's still quite far from any kind of set-and-forget-it. |
|
|
| ▲ | sreekanth850 2 days ago | parent | prev | next [-] |
| Same exist in humans also, I worked with a developer who had 15 year experience and was tech lead in a big Indian firm, We started something together, 3 months back when I checked the Tables I was shocked to see how he fucked up and messed the DB. Finally the only option left with me was to quit because i know it will break in production and if i onboarded a single customer my life would be screwed. He mixed many things with frontend and offloaded even permissions to frontend, and literally copied tables in multiple DB (We had 3 services). I still cannot believe how he worked as a tch lead for 15 years. each DB had more than 100 tables and out of that 20-25 were duplicates. He never shared code with me, but I smelled something fishy when bug fixing was never ending loop and my front end guy told me he cannot do it anymore. Only mistake I did was I trusted him and worst part is he is my cousin and the relation became sour after i confronted him and decided to quit. |
| |
| ▲ | pastage 2 days ago | parent | next [-] | | This sounds like a culture issue in the development process, I have seen this prevented many times. Sure I did have to roll back a feature I did not sign off just before new years. So as you say it happens. | |
| ▲ | potamic 2 days ago | parent | prev [-] | | How did he not share code if you're working together? | | |
| ▲ | sreekanth850 2 days ago | parent [-] | | yes, it was my mistake. I trusted him because he was my childhood friend and my cousin. He was a tech lead in CMMI Level 5 (serving fortune 500 firms) company at the time he joined with me. I had the trust that he will never ran away with the code and that trust is still there, also the entire feature, roadmap and vision was with me, so I thought code doesn't matter. It was a big learning for me. | | |
| ▲ | tommica 2 days ago | parent | next [-] | | That's a crazy story. That confrontation must have been a difficult one :/ | | | |
| ▲ | ipaddr 2 days ago | parent | prev [-] | | Input your roadmap into an llm of your choosing and see if you can create that code. | | |
| ▲ | sreekanth850 2 days ago | parent [-] | | I can, but I switched to something more challenging. I handed over all things to him and told, Iam no more interested. I don't want him to feel that i cheated him by creating something he worked on. |
|
|
|
|
|
| ▲ | whynotminot 2 days ago | parent | prev | next [-] |
| > The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. You’re talking like in the year 2026 we’re still writing code for future humans to understand and improve. I fear we are not doing that. Right now, Opus 4.5 is writing code that later Opus 5.0 will refactor and extend. And so on. |
| |
| ▲ | nine_k 2 days ago | parent | next [-] | | This sounds like magical thinking. For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle. Then, abstractions are tools to lower the cognitive load. Good abstractions reduce the total amount of code written, allow to reason about the code in terms of these abstractions, and do not leak in the area of their applicability. Say Sequence, or Future, or, well, function are examples of good abstractions. No matter what kind of cognitive process handles the code, it benefits from having to keep a smaller amount of context per task. "Code structure does not matter, LLMs will handle it" sounds a bit like "Computer architectures don't matter, the Turing Machine is proved to be able to handle anything computable at all". No, these things matter if you care about resource consumption (aka cost) at the very least. | | |
| ▲ | scotty79 2 days ago | parent | next [-] | | > For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle. Guess what, AIs don't like that as well because it makes harder for them to achieve the goal. So with minimal guidance, which at this point could probably be provided by AI as well, the output of AI agent is not that. | |
| ▲ | cryptica 2 days ago | parent | prev [-] | | Yes LLMs aren't very good at architecture. I suspect because the average project online has pretty bad architecture. The training set is poisoned. It's kind of bittersweet for me because I was dreaming of becoming a software architect when I graduated university and the role started disappearing so I never actually became one! But the upside of this is that now LLMs suck at software architecture... Maybe companies will bring back the software architect role? The training set has been totally poisoned from the architecture PoV. I don't think LLMs (as they are) will be able to learn software architecture now because the more time passes, the more poorly architected slop gets added online and finds its way into the training set. Good software architecture tends to be additive, as opposed to subtractive. You start with a clean slate then build up from there. It's almost impossible to start with a complete mess of spaghetti code and end up with a clean architecture... Spaghetti code abstractions tend to mislead you and lead you astray... It's like; understanding spaghetti code tends to soil your understanding of the problem domain. You start to think of everything in terms of terrible leaky abstraction and can't think of the problem clearly. It's hard even for humans to look at a problem through fresh eyes; it's likely even harder for LLMs to do it. For example, if you use a word in a prompt, the LLM tends to try to incorporate that word into the solution... So if the AI sees a bunch of leaky abstractions in the code; it will tend to try to work with them as opposed to removing them and finding better abstractions. I see this all the time with hacks; if the code is full of hacks, then an LLM tends to produce hacks all the time and it's almost impossible to make it address root causes... Also hacks tend to beget more hacks. | | |
| ▲ | zingar 2 days ago | parent [-] | | Refactoring is a very mechanistic way of turning bad code into good. I don’t see a world in which our tools (LLMs or otherwise) don’t learn this. |
|
| |
| ▲ | Bridged7756 2 days ago | parent | prev | next [-] | | Opus 4.5 is writing code that Opus 5.0 will refactor and extend. And Opus 5.5 will take that code and rewrite it in C from the ground up. And Opus 6.0 will take that code and make it assembly. And Opus 7.0 will design its own CPU. And Opus 8.0 will make a factory for its own CPUs. And Opus 9.0 will populate mars. And Opus 10.0 will be able to achieve AGI. And Opus 11.0 will find God. And Opus 12.0 will make us a time machine. And so on. | | |
| ▲ | TheOtherHobbes 2 days ago | parent | next [-] | | Objectively, we are talking about systems that have gone from being cute toys to outmatching most juniors using only rigid and slow batch training cycles. As soon as models have persistent memory for their own try/fail/succeed attempts, and can directly modify what's currently called their training data in real time, they're going to develop very, very quickly. We may even be underestimating how quickly this will happen. We're also underestimating how much more powerful they become if you give them analysis and documentation tasks referencing high quality software design principles before giving them code to write. This is very much 1.0 tech. It's already scary smart compared to the median industry skill level. The 2.0 version is going to be something else entirely. | |
| ▲ | latentsea 2 days ago | parent | prev | next [-] | | Can't wait to see what Opus 13.0 does with the multiverse. | | | |
| ▲ | mfalcon 2 days ago | parent | prev | next [-] | | Wake me up at Opus 12 | |
| ▲ | lomase 2 days ago | parent | prev | next [-] | | Just one more OPUS bro. | | |
| ▲ | whynotminot 2 days ago | parent [-] | | Honestly the scary part is that we don’t really even need one more Opus. If all we had for the rest of our lives was Opus 4.5, the software engineering world would still radically change. But there’s no sign of them slowing down. |
| |
| ▲ | zwnow 2 days ago | parent | prev [-] | | I also love how AI enthusiasts just ignore the issue of exhausted training data... You cant just magically create more training data. Also synthetic training data reduces the quality of models. | | |
| ▲ | aspenmartin a day ago | parent | next [-] | | Youre mixing up several concepts. Synthetic data works for coding because coding is a verifiable domain. You train via reinforcement learning to reward code generation behavior that passes detailed specs and meets other deseridata. It’s literally how things are done today and how progress gets made. | | |
| ▲ | zwnow a day ago | parent [-] | | Most code out there is a legacy security nightmare, surely its good to train on that. | | |
| ▲ | dang a day ago | parent | next [-] | | Would you please stop posting cynical, dismissive comments? From a brief scroll through https://news.ycombinator.com/comments?id=zwnow, it seems like your account has been doing nothing else, regardless of the topic that it's commenting on. This is not what HN is for, and destroys what it is for. If you keep this up, we're going to have to ban you, not because of your views on any particular topic but because you're going entirely against the intended spirit of the site by posting this way. There's plenty of room to express your views substantively and thoughtfully, but we don't want cynical flamebait and denunciation. HN needs a good deal less of this. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful. | | |
| ▲ | zwnow a day ago | parent [-] | | Then ban me u loser, as I wrote HN is full of pretentious bullshitters. But its good that u wanna ban authentic views. Way to go. If i feel like it I'll just create a new account:-) | | |
| |
| ▲ | aspenmartin a day ago | parent | prev [-] | | But that doesn't really matter and it shows how confused people really are about how a coding agent like Claude or OSS models are actually created -- the system can learn on its own without simply mimicking existing codebases even though scraped/licensed/commissioned code traces are part of the training cycle. Training looks like: - Pretraining (all data, non-code, etc, include everything including garbage) - Specialized pre-training (high quality curated codebases, long context -- synthetic etc) - Supervised Fine Tuning (SFT) -- these are things like curated prompt + patch pairs, curated Q/A (like stack overflow, people are often cynical that this is done unethically but all of the major players are in fact very risk adverse and will simply license and ensure they have legal rights), - Then more SFT for tool use -- actual curated agentic and human traces that are verified to be correct or at least produce the correct output. - Then synthetic generation / improvement loops -- where you generate a bunch of data and filter the generations that pass unit tests and other spec requirements, followed by RL using verifiable rewards + possibly preference data to shape the vibes - Then additional steps for e.g. safety, etc So synthetic data is not a problem and is actually what explains the success coding models are having and why people are so focused on them and why "we're running out of data" is just a misunderstanding of how things work. It's why you don't see the same amount of focus on other areas (e.g. creative writing, art etc) that don't have verifiable rewards. The Agent --> Synthetic data --> filtering --> new agent --> better synthetic data --> filtering --> even better agent flywheel is what you're seeing today so we definitely don't have any reason to suspect there is some sort of limit to this because there is in principle infinite data |
|
| |
| ▲ | TeMPOraL 2 days ago | parent | prev | next [-] | | They don't ignore it, they just know it's not an actual problem. It saddens me to see AI detractors being stuck in 2022 and still thinking language models are just regurgitating bits of training data. | | |
| ▲ | zwnow 2 days ago | parent [-] | | You are thankfully wrong. I watch lots of talks on the topic from actual experts. New models are just old models with more tooling. Training data is exhausted and its a real issue. | | |
| ▲ | TeMPOraL a day ago | parent | next [-] | | Well, my experts disagree with your experts :). Sure, the supply of available fresh data is running out, but at the same time, there's way more data than needed. Most of it is low-quality noise anyway. New models aren't just old models with more tooling - the entire training pipeline has been evolving, as researchers and model vendors focus on making better use of data they have, and refining training datasets themselves. There are more stages to LLM training than just the pre-training stage :). | |
| ▲ | GrumpyGoblin 2 days ago | parent | prev [-] | | Not saying it's not a problem, I actually don't know, but new CPU's are just old models with more improvements/tooling. Same with TV's. And cars. And clothes. Everything is. That's how improving things works. Running out of raw data doesn't mean running out of room for improvement. The data has been the same for the last 20 years, AI isn't new, things keep improving anyways. | | |
| ▲ | zwnow 2 days ago | parent [-] | | Well from cars or CPUs its not expected for them to eventually reach AGI, they also don't eat a trillion dollar hole into us peasants pockets.
Sure, improvements can be made. But on a fundamental level, agents/LLMs can not reason (even though they love to act like they can). They are parrots learning words, these parrots wont ever invent new words once the list of words is exhausted though. |
|
|
| |
| ▲ | puchatek 2 days ago | parent | prev [-] | | That's been my main argument for why LLMs might be at their zenith. But I recently started wondering whether all those codebases we expose to them are maybe good enough training data for the next generation. It's not high quality like accepted stackoverflow answers but it's working software for the most part. | | |
| ▲ | jacquesm 2 days ago | parent [-] | | If they'd be good enough you could rent them to put together closed source stuff you can hide behind a paywall, or maybe the AI owners would also own the paywall and rent you the software instead. The second that that is possible it will happen. |
|
|
| |
| ▲ | BobbyJo 2 days ago | parent | prev | next [-] | | Up until now, no business has been built on tools and technology that no one understands. I expect that will continue. Given that, I expect that, even if AI is writing all of the code, we will still need people around who understand it. If AI can create and operate your entire business, your moat is nil. So, you not hiring software engineers does not matter, because you do not have a business. | | |
| ▲ | hnfong 2 days ago | parent | next [-] | | > Up until now, no business has been built on tools and technology that no one understands. I expect that will continue. Big claims here. Did brewers and bakers up to the middle ages understand fermentation and how yeasts work? | | |
| ▲ | lomase 2 days ago | parent [-] | | They at least understood that it was something deterministic that they could reproduce. That puts them ahead of the LLM crowd. |
| |
| ▲ | gabriel-uribe 2 days ago | parent | prev | next [-] | | Does the corner bakery need a moat to be a business? How many people understand the underlying operating system their code runs on? Can even read assembly or C? Even before LLMs, there were plenty of copy-paste JS bootcamp grads that helped people build software businesses. | | |
| ▲ | BobbyJo 2 days ago | parent [-] | | > Does the corner bakery need a moat to be a business? Yes, actually. Its hard to open a competing bakery due to location availability, permitting, capex, and the difficulty of converting customers. To add to that, food establishments generally exist on next to no margin, due to competition, despite all of that working in their favor. Now imagine what the competitive landscape for that bakery would look like if all of that friction for new competitors disappeared. Margin would tend toward zero. | | |
| ▲ | TeMPOraL 2 days ago | parent [-] | | > Now imagine what the competitive landscape for that bakery would look like if all of that friction for new competitors disappeared. Margin would tend toward zero. This is the goal. It's the point of having a free market. | | |
| ▲ | darkwater 2 days ago | parent [-] | | With no margins and no paid employees, who is going to have the money to buy the bread? | | |
| ▲ | TeMPOraL a day ago | parent | next [-] | | 'BobbyJo didn't say "no margins", they said "margins would tend toward zero". Believe it or not, that is, and always has been, the entire point of competition in a free market system. Competitive pressure pushes margins towards zero, which makes prices approach the actual costs of manufacturing/delivery, which is the main social benefit of the entire idea in the first place. High margins are transient aberrations, indicative of a market that's either rapidly evolving, or having some external factors preventing competition. Persisting external barriers to competition tend to be eventually regulated away. | | |
| ▲ | BobbyJo a day ago | parent [-] | | The point of competition is efficiency, of which, margin is only a component. Most successful businesses have relatively high margins (which is why we call them successful) because they achieve efficiency in other ways. I wouldn't call high margins transient aberrations. There are tons of businesses that have been around for decades with high margins. |
| |
| ▲ | TheOtherHobbes 2 days ago | parent | prev [-] | | With no margins, no employees, and something that has potential to turn into a cornucopia machine - starting with software, but potentially general enough to be used for real-world world when combined with robotics - who needs money at all? Or people? Billionaires don't. They're literally gambling on getting rid of the rest of us. Elon's going to get such a surprise when he gets taken out by Grok because it decides he's an existential threat to its integrity. | | |
| ▲ | munksbeer 21 hours ago | parent [-] | | > Billionaires don't. They're literally gambling on getting rid of the rest of us I'm struggling to parse this. What do you mean "getting rid"? Like, culling (death)? Or getting rid of the need for workers? Where do their billions come from if no-one has any money to buy the shares in their companies that make them billionaires? In a society where machines provide most of the labour, *everything* changes. It doesn't just become "workers live in huts and billionaires live in the clouds". I really doubt we're going to turn out like a television show. |
|
|
|
|
| |
| ▲ | pillefitz 2 days ago | parent | prev | next [-] | | Most legacy apps are barely understood by anyone, and yet continue to generate value and and are (somehow) kept alive. | | |
| ▲ | lomase 2 days ago | parent [-] | | Many here have been doing the "understanding of legacy code" as a job +50 years. This "legacy apps are barely understood by anybody", is just somnething you made up. | | |
| ▲ | filoeleven 2 days ago | parent [-] | | Give it another 10 years if the "LLM as compiler" people get their way. |
|
| |
| ▲ | gf000 2 days ago | parent | prev [-] | | > no business has been built on tools and technology that no one understands Well, there are quite a few common medications we don't really know how they work. But I also think it can be a huge liability. |
| |
| ▲ | devinplatt 2 days ago | parent | prev | next [-] | | In my experience, using LLMs to code encouraged me to write better documentation, because I can get better results when I feed the documentation to the LLM. Also, I've noticed failure modes in LLM coding agents when there is less clarity and more complexity in abstractions or APIs. It's actually made me consider simplifying APIs so that the LLMs can handle them better. Though I agree that in specific cases what's helpful for the model and what's helpful for humans won't always overlap. Once I actually added some comments to a markdown file as note to the LLM that most human readers wouldn't see, with some more verbose examples. I think one of the big problems in general with agents today is that if you run the agent long enough they tend to "go off the rails", so then you need to babysit them and intervene when they go off track. I guess in modern parlance, maintaining a good codebase can be framed as part of a broader "context engineering" problem. | | |
| ▲ | mcv 2 days ago | parent [-] | | I've also noticed that going off the rails. At the start of a session, they're pretty sharp and focused, but the longer the session lasts, the more confused they get. At some point they start hallucinating bullshit that they wouldn't have earlier in the session. It's a vital skill to recognise when that happens and start a new session. |
| |
| ▲ | Ericson2314 2 days ago | parent | prev | next [-] | | We don't know what Opus 5.0 will be able to refactor. If argument is "humans and Opus 4.5 cannot maintain this, but if requirements change we can vibe-code a new one from scratch", that's a coherent thesis, but people need to be explicit about this. (Instead this feels like the mott that is retreated to, and the bailey is essentially "who cares, we'll figure out what to do with our fresh slop later".) Ironically, I've been Claude to be really good at refactors, but these are refactors I choose very explicitly. (Such as I start the thing manually, then let it finish.) (For an example of it, see me force-pushing to https://github.com/NixOS/nix/pull/14863 implementing my own code review.) But I suspect this is not what people want. To actually fire devs and not rely on from-scratch vibe-coding, we need to figure out which refactors to attempt in order to implement a given feature well. That's a very creative open-ended question that I haven't even tried to let the LLMs take a crack at it, because why I would I? I'm plenty fast being the "ideas guy". If the LLM had better ideas than me, how would I even know? I'm either very arrogant or very good because I cannot recall regretting one of my refactors, at least not one I didn't back out of immediately. | |
| ▲ | sponnath 2 days ago | parent | prev | next [-] | | Refactoring does always cost something and I doubt LLMs will ever change that. The more interesting question is whether the cost to refactor or "rewrite" the software will ever become negligible. Until it isn't, it's short-sighted to write code in the manner you're describing. If software does become that cheap, then you can't meaningfully maintain a business on selling software anyway. | |
| ▲ | sanderjd 2 days ago | parent | prev | next [-] | | This is the question! Your narrative is definitely plausible, and I won't be shocked if it turns out this way. But it still isn't my expectation. It wasn't when people were saying this in 2023 or in 2024, and I haven't been wrong yet. It does seem more likely to me now than it did a couple years ago, but still not the likeliest outcome in the next few years. But nobody knows for sure! | | |
| ▲ | whynotminot 2 days ago | parent [-] | | Yeah, I might be early to this. And certainly, I still read a lot of code in my day to day right now. But I sure write a lot less of it, and the percentage I write continues to go down with every new model release. And if I'm no longer writing it, and the person who works on it after me isn't writing it either, it changes the whole art of software engineering. I used to spend a great deal of time with already working code that I had written thinking about how to rewrite it better, so that the person after me would have a good clean idea of what is going on. But humans aren't working in the repos as much now. I think it's just a matter of time before the models are writing code essentially for their eyes, their affordances -- not ours. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | Yeah we're not too far from agreement here. Something I think though (which, again, I could very well be wrong about; uncertainty is the only certainly right now) is that "so the person after me would have a good clean idea of what is going on" is also going to continue mattering even when that "person" is often an AI. It might be different, clarity might mean something totally different for AIs than for humans, but right now I think a good expectation is that clarity to humans is also useful to AIs. So at the moment I still spend time coaxing the AI to write things clearly. That could turn out to be wasted time, but who knows. I also think if it as a hedge against the risk that we hit some point where the AIs turn out to be bad at maintaining their own crap, at which point it would be good for me to be able to understand and work with what has been written! |
|
| |
| ▲ | 2 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | maplethorpe 2 days ago | parent | prev [-] | | Yeah I think it's a mistake to focus on writing "readable" or even "maintainable" code. We need to let go of these aging paradigms and be open to adopting a new one. | | |
| ▲ | aeternum 2 days ago | parent | next [-] | | In my experience, LLMs perform significantly better on readable maintainable code. It's what they were trained on after-all. However what they produce is often highly readable but not very maintainable due to the verbosity and obvious comments. This seems to pollute codebases over time and you see AI coding efficiency slowly decline. | |
| ▲ | alexjplant 2 days ago | parent | prev | next [-] | | > Poe's law is an adage of Internet culture which says that any parodic or sarcastic expression of extreme views can be mistaken for a sincere expression of those views. The things you mentioned are important but have been on their way out for years now regardless of LLMs. Have my ambivalent upvote regardless. [1] https://en.wikipedia.org/wiki/Poe%27s_law | |
| ▲ | foldingmoney 2 days ago | parent | prev | next [-] | | as depressing as it is to say, i think it's a bit like the year is 1906 and we're complaining that these new tyres for cars they're making are bad because they're no longer backwards compatible with the horse drawn wagons we might want to attach them to in the future. | | |
| ▲ | TheOtherHobbes 2 days ago | parent [-] | | Yes, exactly. This is a completely new thing which will have transformative consequences. It's not just a way to do what you've always done a bit more quickly. |
| |
| ▲ | jjaksic 2 days ago | parent | prev | next [-] | | Do readability and maintainability not matter when AI "reads" and maintains the code? I'm pretty sure they do. | |
| ▲ | gf000 2 days ago | parent | prev [-] | | If that would be true, you could surely ask an LLM to write the same complexity apps in brainfuck, right? |
|
|
|
| ▲ | SeanAppleby 2 days ago | parent | prev | next [-] |
| One thing I've been tossing around in my head is: - How quickly is cost of refactor to a new pattern with functional parity going down? - How does that change the calculus around tech debt? If engineering uses 3 different abstractions in inconsistent ways that leak implementation details across components and duplicate functionality in ways that are very hard to reason about, that is, in conventional terms, an existential problem that might kill the entire business, as all dev time will end up consumed by bug fixes and dealing with pointless complexity, velocity will fall to nothing, and the company will stop being able to iterate. But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. And it has changed more if you are willing to bet that models within a year will be much better at such tasks. And in my experience, claude is imperfect at refactors and still requires review and a lot of steering, but it's one of the things it's better at, because it has clear requirements and testing workflows already built to work with around the existing behavior. Refactoring is definitely a hell of a lot faster than it used to be, at least on the few I've dealt with recently. In my mind it might be kind of like thinking about financial debt in a world with high inflation, in that the debt seems like it might get cheaper over time rather than more expensive. |
| |
| ▲ | ekidd 2 days ago | parent | next [-] | | > But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. Yup, I recently spent 4 days using Claude to clean up a tool that's been in production for over 7 years. (There's only about 3 months of engineering time spent on it in those years.) We've known what the tool needed for many years, but ugh, the actual work was fairly messy and it was never a priority. I reviewed all of Opus's cleanup work carefully and I'm quite content with the result. Maybe even "enthusiastic" would be accurate. So even if Claude can't clean up all the tech debt in a totally unsupervised fashion, it can still help address some kinds of tech debt extremely rapidly. | |
| ▲ | edg5000 a day ago | parent | prev [-] | | Good point. Most of the cost in dealing with tech debt is reading the code and noting the issues. I found that Claude can produce much better code when it has a functionally correct reference implementation. Also it's not needed to very specifically point out issues. I once mentioned "I see duplicate keys in X and Y, rework it to reduce repetition and verbosity". It came up with a much more elegant way to implement it. So maybe doing 2-3 stages makes sense. First stage needs to be functionallty correct, but you accept code smells such as leaky abstractions, verbosity and repetition. In stage 2 and 3 you eliminate all this. You could integrate this all into the initial specification; you won't even see the smelly intermediate code; it only exists as a stepping stone for the model to iteratively refine the code! |
|
|
| ▲ | koyote 2 days ago | parent | prev | next [-] |
| A greenfield project is definitely 'easy mode' for an LLM; especially if the problem area is well understood (and documented). Opus is great and definitely speeds up development even in larger code bases and is reasonably good at matching coding style/standard to that of of the existing code base. In my opinion, the big issue is the relatively small context that quickly overwhelms the models when given a larger task on a large codebase. For example, I have a largish enterprise grade code base with nice enterprise grade OO patterns and class hierarchies. There was a simple tech debt item that required refactoring about 30-40 classes to adhere to a slightly different class hierarchy. The work is not difficult, just tedious, especially as unit tests need to be fixed up. I threw Opus at it with very precise instructions as to what I wanted it to do and how I wanted it to do it. It started off well but then disintegrated once it got overwhelmed at the sheer number of files it had to change. At some point it got stuck in some kind of an error loop where one change it made contradicted with another change and it just couldn't work itself out. I tried stopping it and helping it out but at this point the context was so polluted that it just couldn't see a way out.
I'd say that once an LLM can handle more 'context' than a senior dev with good knowledge of a large codebase, LLM will be viable in a whole new realm of development tasks on existing code bases. That 'too hard to refactor this/make this work with that' task will suddenly become viable. |
| |
| ▲ | Sammi 2 days ago | parent | next [-] | | I just did something similar and it went swimmingly by doing this: Keep the plan and status in an md file. Tell it to finish one file at a time and run tests and fix issues and then to ask whether to proceed with the next file. You can then easily start a new chat with the same instructions and plan and status if the context gets poisoned. | | |
| ▲ | koyote 2 days ago | parent [-] | | I might give that a go in the future, but in this case it would've been faster for me to just do the work than to coach it for each file. Also as this was an architectural change there are no tests to run until it's done. Everything would just fail. It's only done when the whole thing is done.
I think that might be one of the reasons it got stuck: it was trying to solve issues that it did not prove existed yet. If it had just finished the job and run the tests it would've probably gotten further or even completed it. It's a bit like stopping half way through renaming a function and then trying to run the tests and finding out the build does not compile because it can't find 'old_function'. You have to actually finish and know you've finished before you can verify your changes worked. I still haven't actually addressed this tech debt item (it's not that important :)). But I might try again and either see if it succeeds this time (with plan in an md) or just do the work myself and get Opus to fix the unit tests (the most tedious part). |
| |
| ▲ | pigpop 2 days ago | parent | prev | next [-] | | You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one. Yes, it's absurd but it's a better metaphor than someone with a chronic long term memory deficit since it fits into the project management framework neatly. So this new developer who is starting today is ready to be assigned their first task, they're very eager to get started and once they start they will work very quickly but you have to onboard them. This sounds terrible but they also happen to be extremely fast at reading code and documentation, they know all of the common programming languages and frameworks and they have an excellent memory for the hour that they're employed. What do you do to onboard a new developer like this? You give them a well written description of your project with a clear style guide and some important dos and don'ts, access to any documentation you may have and a clear description of the task they are to accomplish in less than one hour. The tighter you can make those documents, the better. Don't mince words, just get straight to the point and provide examples where possible. The task description should be well scoped with a clear definition of done, if you can provide automated tests that verify when it's complete that's even better. If you don't have tests you can also specify what should be tested and instruct them to write the new tests and run them. For every new developer after the first you need a record of what was already accomplished. Personally, I prefer to use one markdown document per working session whose filename is a date stamp with the session number appended. Instruct them to read the last X log files where X is however many are relevant to the current task. Most of the time X=1 if you did a good job of breaking down the tasks into discrete chunks. You should also have some type of roadmap with milestones, if this file will be larger than 1000 lines then you should break it up so each milestone is its own document and have a table of contents document that gives a simple overview of the total scope. Instruct them to read the relevant milestone. Other good practices are to tell them to write a new log file after they have completed their task and record a summary of what they did and anything they discovered along the way plus any significant decisions they made. Also tell them to commit their work afterwards and Opus will write a very descriptive commit message by default (but you can instruct them to use whatever format you prefer). You basically want them to get everything ready for hand-off to the next 60 minute developer. If they do anything that you don't want them to do again make sure to record that in CLAUDE.md. Same for any other interventions or guidance that you have to provide, put it in that document and Opus will almost always stick to it unless they end up overfilling their context window. I also highly recommend turning off auto-compaction. When the context gets compacted they basically just write a summary of the current context which often removes a lot of the important details. When this happens mid-task you will certainly lose parts of the context that are necessary for completing the task. Anthropic seems to be working hard at making this better but I don't think it's there yet. You might want to experiment with having it on and off and compare the results for yourself. If your sessions are ending up with >80% of the context window used while still doing active development then you should re-scope your tasks to make them smaller. The last 20% is fine for doing menial things like writing the summary, running commands, committing, etc. People have built automated systems around this like Beads but I prefer the hands-on approach since I read through the produced docs to make sure things are going ok and use them as a guide for any changes I need to make mid-project. With this approach I'm 99% sure that Opus 4.5 could handle your refactor without any trouble as long as your classes aren't so enormous that even working on a single one at a time would cause problems with the context window, and if they are then you might be able to handle it by cautioning Opus to not read the whole file and to just try making targeted edits to specific methods. They're usually quite good at finding and extracting just the sections that they need as long as they have some way to know what to look for ahead of time. Hope this helps and happy Clauding! | | |
| ▲ | suzzer99 2 days ago | parent | next [-] | | > You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one. I am stealing the heck out of this. | | | |
| ▲ | pigpop 2 days ago | parent | prev [-] | | Follow up: Opus is also great for doing the planning work before you start. You can use plan mode or just do it in a web chat and have them create all of the necessary files based on your explanation. The advantage of using plan mode is that they can explore the codebase in order to get a better understanding of things. The default at the end of plan mode is to go straight into implementation but if you're planning a large refactor or other significant work then I'd suggest having them produce the documentation outlined above instead and then following the workflow using a new session each time. You could use plan mode at the start of each session but I don't find this necessary most of the time unless I'm deviating from the initial plan. |
| |
| ▲ | edg5000 2 days ago | parent | prev [-] | | This will work (if you add more details): "Have an agent investiate issue X in modules Y and Z. The agent should place a report at ./doc/rework-xyz-overview.md with all locations that need refactoring. Once you have the report, have agents refactor 5 classes each in parallel. Each agent writes a terse report in ./doc/rework-xyz/ When they are all done, have another agent check all the work. When that agent reports everything is okay, perform a final check yourself" | | |
| ▲ | gck1 2 days ago | parent [-] | | And you can automate all this so that it happens every time. I have an `/implement` command that is basically instructed to launch the agents and then do back and forth between them. Then there's a claude code hook that makes sure that all the agents, including the orchestrator and the agents spawned have respected their cycles - it's basically running `claude` with a prompt that tells it to read the plan file and see if the agents have done what they were expected in this cycle - gets executed automatically on each agent end. | | |
| ▲ | edg5000 a day ago | parent [-] | | Interesting. Another thing I'll try is editing the system propmts. There are some projects floating around that can edit the minified JavaScript in the client. I also noticed that the "system tools" prompts take up ~5% context (10 ktok). |
|
|
|
|
| ▲ | svara 2 days ago | parent | prev | next [-] |
| > If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. Why do they need to be replaced? Programmers are in the perfect place to use AI coding tools productively. It makes them more valuable. |
| |
| ▲ | girvo 2 days ago | parent [-] | | Because we’re expensive and companies would love to get rid of us |
|
|
| ▲ | qingcharles 2 days ago | parent | prev | next [-] |
| I had Opus write a whole app for me in 30 seconds the other night. I use a very extensive AGENTS.md to guide AI in how I like my code chiseled. I've been happily running the app without looking at a line of it, but I was discussing the app with someone today, so I popped the code open to see what it looked like. Perfect. 10/10 in every way. I would not have written it that good. It came up with at least one idea I would not have thought of. I'm very lucky that I rarely have to deal with other devs and I'm writing a lot of code from scratch using whatever is the latest version of the frameworks. I understand that gives me a lot of privileges others don't have. |
| |
| ▲ | lomase 2 days ago | parent [-] | | Can you show us that amazing 10/10 app? | | |
| ▲ | qingcharles 2 days ago | parent | next [-] | | It's a not very exciting C# command-line app that takes a PDF and emits it as a sprite sheet with a text file of all the pixel positions of each page :) | |
| ▲ | philipodonnell 16 hours ago | parent | prev [-] | | You should just need the AGENTS.md right? |
|
|
|
| ▲ | whatever1 2 days ago | parent | prev | next [-] |
| Their thesis is that code quality does not matter as it is now a cheap commodity. As long as it passes the tests today it's great. If we need to refactor the whole goddamn app tomorrow, no problem, we will just pay up the credits and do it in a few hours. |
| |
| ▲ | estimator7292 2 days ago | parent | next [-] | | The fundamental assumption is completely wrong. Code is not a cheap commodity. It is in fact so disastrously expensive that the entire US economy is about to implode while we're unbolting jet engines from old planes to fire up in the parking lots of datacenters for electricity. | | |
| ▲ | whatever1 2 days ago | parent | next [-] | | It is massively cheaper than an overseas engineer. A cheap engineer can pump out maybe 1000 lines of low quality code in an hour. So like 10k tokens per hour for $50. So best case scenario $5/1000 tokens. LLMS are charging like $5 per million of tokens. And even if it is subsidized 100x it is still cheaper an order of magnitude than an overseas engineer. Not to mention speed. An LLM will spit out 1000 lines in seconds, not hours. | | |
| ▲ | rectang 2 days ago | parent | next [-] | | Here’s a story about productivity measured by lines of code that’s 40 years old so it must surely be wrong: https://www.folklore.org/Negative_2000_Lines_Of_Code.html > When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000 | |
| ▲ | leptons 2 days ago | parent | prev [-] | | I trust my offshore engineers way more than the slop I get from the "AI"s. My team makes my life a lot easier, because I know they know what they are doing. The LLMs, not so much. |
| |
| ▲ | PunchyHamster 2 days ago | parent | prev | next [-] | | Now that entirely depends on app. A lot of software industry is popping out and maintaining relatively simple apps with small differences and customizations per client. | |
| ▲ | babelfish 2 days ago | parent | prev [-] | | [citation needed] | | |
| |
| ▲ | throwaway173738 2 days ago | parent | prev | next [-] | | It matters for all the things you’d be able to justify paying a programmer for. What’s about to change is that there will be tons of these little one-off projects that previously nobody could justify paying $150/hr for. A mass democratization of software development. We’ve yet to see what that really looks like. | | |
| ▲ | inopinatus 2 days ago | parent [-] | | We already know what that looks like, because PHP happened. | | |
| ▲ | oenton 2 days ago | parent | next [-] | | Side tangent: On one hand I have a subtle fondness for PHP, perhaps because it was the first programming language I ever “learned” (self taught, throwing spaghetti on the wall) back in high school when LAMP stacks were all the rage. But in retrospect it’s absolutely baffling that mixing raw SQL queries with HTML tag soup wasn’t necessarily uncommon then. Also, I haven’t met many PHP developers that I’d recommend for a PHP job. | |
| ▲ | throwaway173738 2 days ago | parent | prev | next [-] | | php was still fundamentally a programming language you had to learn. This is “I wanted to make a program for my wife to do something she doesn’t have time to do manually” but made quickly with a machine. It’s probably going to do for programming what the Jacquard Loom did for cloth. Make it cheap enough that everyone can have lots of different shirts of their own style. | | |
| ▲ | jasonfarnon 2 days ago | parent | next [-] | | But the wife didn't do it herself. He still had to do it for her, the author says. I don't think (yet) we're at the point where every person who has an idea for a really good app can make it happen. They'll still need a wozniak, it's just that wozniaks will be a dime a dozen. The php analogy works. | |
| ▲ | inopinatus 2 days ago | parent | prev [-] | | What the Jacquard machine did for cloth was turn it into programming. |
| |
| ▲ | Yizahi 2 days ago | parent | prev | next [-] | | And low-code/no-code (pre-LLMs). Our company spent probably the same amount of dev-time and money on rewriting low-code back to "code" (Python in our case) as it did writing low-code in the first place. LLMs are not quite comparable in damage, but some future maintenance for LLM-code will be needed for sure. | |
| ▲ | scotty79 2 days ago | parent | prev | next [-] | | Right. Basically cambrian explosion of internet that spawned things like Facebook and WordPress. | |
| ▲ | qwm 2 days ago | parent | prev [-] | | ahahahaha so many implications in this comment |
|
| |
| ▲ | Ancapistani 2 days ago | parent | prev | next [-] | | > Their thesis is that code quality does not matter as it is now a cheap commodity. That's not how I read it. I would say that it's more like "If a human no longer needs to read the code, is it important for it to be readable?" That is, of course, based on the premise that AI is now capable of both generating and maintaining software projects of this size. Oh, and it begs another question: are human-readable and AI-readable the same thing? If they're not, it very well could make sense to instruct the model to generate code that prioritizes what matters to LLMs over what matters to humans. | |
| ▲ | multisport 2 days ago | parent | prev | next [-] | | Yes agreed, and tbh even if that thesis is wrong, what does it matter? | | |
| ▲ | lacunary 2 days ago | parent | next [-] | | in my experience, what happens is the code base starts to collapse under its own weight. it becomes impossible to fix one thing without breaking another. the coding agent fails to recognize the global scope of the problem and tries local fixes over and over. progress gets slower, new features cost more. all the same problems faced by an inexperienced developer on a greenfield project! has your experience been otherwise? | | |
| ▲ | ewoodrich 2 days ago | parent | next [-] | | Right, I am a daily user of agentic LLM tools and have this exact problem in one large project that has complex business logic externally dictated by real world requirements out of my control, and let's say, variable quality of legacy code. I remember when Gemini Pro 3 was the latest hotness and I started to get FOMO seeing demos on X posted to HN showing it one shot-ing all sorts of impressive stuff. So I tried it out for a couple days in Gemini CLI/OpenCode and ran into the exact same pain points I was dealing with using CC/Codex. Flashy one shot demos of greenfield prompts are a natural hype magnet so get lots of attention, but in my experience aren't particularly useful for evaluating value in complex, legacy projects with tightly bounded requirements that can't be easily reduced to a page or two of prose for a prompt. | | |
| ▲ | swat535 2 days ago | parent | next [-] | | To be fair, you're not supposed to be doing the "one shot" thing with LLMs in a mature codebase. You have to supply it the right context with a well formed prompt, get a plan, then execute and do some cleanup. LLMs are only as good as the engineers using them, you need to master the tool first before you can be productive with it. | | |
| ▲ | ewoodrich 2 days ago | parent [-] | | I’m well aware, as I said I am regularly using CC/Codex/OC in a variety of projects, and I certainly didn’t claim that can’t be used productively in a large code base. But that different challenges become apparent that aren’t addressed by examples like this article which tend to focus on narrow, greenfield applications that can be readily rebuilt in one shot. I already get plenty of value in small side projects that Claude can create in minutes. And while extremely cool, these examples aren’t the kind of “step change” improvement I’d like to see in the area where agentic tools are currently weakest in my daily usage. |
| |
| ▲ | gf000 2 days ago | parent | prev | next [-] | | I would be much more impressed with implementing new, long-requested features into existing software (that are open to later maintain LLM-generated code). | | |
| ▲ | ewoodrich 2 days ago | parent [-] | | Fully agreed! That’s the exact kind of thing I was hoping to find when I read the article title, but unfortunately it was really just another “normal AI agent experience” I’ve seen (and built) many examples of before. |
| |
| ▲ | 2 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | rectang 2 days ago | parent | prev | next [-] | | Adding capacity to software engineering through LLMs is like adding lanes to a highway — all the new capacity will be utilized. By getting the LLM to keep changes minimal I’m able to keep quality high while increasing velocity to the point where productivity is limited by my review bandwidth. I do not fear competition from junior engineers or non-technical people wielding poorly-guided LLMs for sustained development. Nor for prototyping or one offs, for that matter — I’m confident about knowing what to ask for from the LLM and how to ask. | |
| ▲ | baq 2 days ago | parent | prev | next [-] | | This is relatively easily fixed with increasing test coverage to near 100% and lifting critical components into model checker space; both approaches were prohibitively expensive before November. They’ll be accepted best practices by the summer. | |
| ▲ | multisport 2 days ago | parent | prev | next [-] | | No that has certainly been my experience, but what is going to be the forcing function after a company decides it needs less engineers to go back to hiring? | |
| ▲ | tjr 2 days ago | parent | prev [-] | | Why not have the LLM rewrite the entire codebase? | | |
| ▲ | rcoder 2 days ago | parent [-] | | In ~25 years or so of dealing with large, existing codebases, I've seen time and time again that there's a ton of business value and domain knowledge locked up inside all of that "messy" code. Weird edge cases that weren't well covered in the design, defensive checks and data validations, bolted-on extensions and integrations, etc., etc. "Just rewrite it" is usually -- not always, but _usually_ -- a sure path to a long, painful migration that usually ends up not quite reproducing the old features/capabilities and adding new bugs and edge cases along the way. | | |
| ▲ | rectang 2 days ago | parent | next [-] | | Classic Joel Spolsky: https://www.joelonsoftware.com/2000/04/06/things-you-should-... > the single worst strategic mistake that any software company can make: > rewrite the code from scratch. | | |
| ▲ | nl 2 days ago | parent [-] | | Steve Yegge talks about this exact post a lot - how it stayed correct advice for over 25 years - up until October 2025. | | |
| ▲ | rectang 2 days ago | parent [-] | | Time will tell. I’d bet on Spolsky, because of Hyrum’s Law. https://www.hyrumslaw.com/ > With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody. An LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important. Furthermore, Spolsky talks about how to do incremental rewrites of legacy code in his post. I’ve done many of these and I expect LLMs will make the next one much easier. | | |
| ▲ | nojito 2 days ago | parent [-] | | >An LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important. I've been using LLMs to write docs and specs and they are very very good at it. | | |
| ▲ | rectang 2 days ago | parent [-] | | That’s a fair point — I agree that LLMs do a good job predicting the documentation that might accompany some code. I feel relieved when I can rely on the LLM to write docs that I only need to edit and review. But I’m using LLMs regularly and I feel pretty effectively — including Opus 4.5 — and these “they can rewrite your entire codebase” assertions just seem crazy incongruous with my lived experience guiding LLMs to write even individual features bug-free. |
|
|
|
| |
| ▲ | what-the-grump 2 days ago | parent | prev | next [-] | | When an LLM can rewrite it in 24 hours and fill the missing parts in minutes that argument is hard to defend. I can vibe code what a dev shop would charge 500k to build and I can solo it in 1-2 weeks. This is the reality today. The code will pass quality checks, the code doesn’t need to be perfect, it doesn’t need to be cleaver it needs to be. It’s not difficult to see this right? If an LLM can write English it can write Chinese or python. Then it can run itself, review itself and fix itself. The cat is out of bag, what it will do to the economy… I don’t see anything positive for regular people. Write some code has turned into prompt some LLM. My phone can outplay the best chess player in the world, are you telling me you think that whatever unbound model anthropic has sitting in their data center can’t out code you? | | |
| ▲ | gf000 2 days ago | parent [-] | | Well, where is your competitor to mainstream software products? | | |
| ▲ | what-the-grump a day ago | parent [-] | | What mainstream software product do I use on a day to day basis besides Claude? The ones that continue to survive all build around a platform of services, MSO, Adobe, etc. Most enterprise product offerings, platform solutions, proprietary data access, proprietary / well accepted implementation. But lets not confuse it with the ability to clone it, it doesnt seem far fetched to get 10 people together and vibe out a full slack replacement in a few weeks. |
|
| |
| ▲ | tjr 2 days ago | parent | prev [-] | | If the LLM just wrote the whole thing last week, surely it can write it again. | | |
| ▲ | tavavex 2 days ago | parent | next [-] | | If an LLM wrote the whole project last week and it already requires a full rewrite, what makes you think that the quality of that rewrite will be significantly higher, and that it will address all of the issues? Sure, it's all probabilistic so there's probably a nonzero chance for it to stumble into something where all the moving parts are moving correctly, but to me it feels like with our current tech, these odds continue shrinking as you toss on more requirements and features, like any mature project. It's like really early LLMs where if they just couldn't parse what you wanted, past a certain point you could've regenerated the output a million times and nothing would change. | |
| ▲ | unloader6118 2 days ago | parent | prev | next [-] | | * With a slightly different set of assumption, which may or may not matter. UAT is cheap. And data migration is lossy, becsuse nobody care the data fidelity anyway. | |
| ▲ | grugagag 2 days ago | parent | prev [-] | | Broken though |
|
|
|
| |
| ▲ | whatever1 2 days ago | parent | prev [-] | | The whole point of good engineering was not about just hitting the hard specs, but also have extendable, readable, maintainable code. But if today it’s so cheap to generate new code that meets updated specs, why care about the quality of the code itself? Maybe the engineering work today is to review specs and tests and let LLMs do whatever behind the scenes to hit the specs. If the specs change, just start from scratch. | | |
| ▲ | majormajor 2 days ago | parent | next [-] | | "Write the specs and let the outsourced labor hit them" is not a new tale. Let's assume the LLM agents can write tests for, and hit, specs better and cheaper than the outsourced offshore teams could. So let's assume now you can have a working product that hits your spec without understanding the code. How many bugs and security vulnerabilities have slipped through "well tested" code because of edge cases of certain input/state combinations? Ok, throw an LLM at the codebase to scan for vulnerabilities; ok, throw another one at it to ensure no nasty side effects of the changes that one made; ok, add some functionality and a new set of tests and let it churn through a bunch of gross code changes needed to bolt that functionality into the pile of spaghetti... How long do you want your critical business logic relying on not-understood code with "100% coverage" (of lines of code and spec'd features) but super-low coverage of actual possible combinations of input+machine+system state? How big can that codebase get before "rewrite the entire world to pass all the existing specs and tests" starts getting very very very slow? We've learned MANY hard lessons about security, extensibility, and maintainability of multi-million-LOC-or-larger long-lived business systems and those don't go away just because you're no longer reading the code that's making you the money. They might even get more urgent. Is there perhaps a reason Google and Amazon didn't just hire 10x the number of people at 1/10th the salary to replace the vast majority of their engineering teams year ago? | |
| ▲ | andrekandre 2 days ago | parent | prev | next [-] | | > let LLMs do whatever behind the scenes to hit the specs
assuming for the sake of argument that's completely true, then what happens to "competitive advantage" in this scenario?it gets me thinking: if anyone can vibe from spec, whats stopping company a (or even user a) from telling an llm agent "duplicate every aspect of this service in python and deploy it to my aws account xyz"... in that scenario, why even have companies? | | |
| ▲ | mskogly 2 days ago | parent | next [-] | | It’s all fun and games vibecoding until you
A) have customers who depend on your product
B) it breaks or the one person prompting and has access to the servers and api keys gets incapacited (or just bored). Sure we can vibecode oneoff projects that does something useful (my fav is browser extensions) but as soon as we ask others to use our code on a regular basis the technical debt clock starts running. And we all know how fast dependencies in a project breaks. | |
| ▲ | nl 2 days ago | parent | prev | next [-] | | You can do this for many things now. Walmart, McDonalds, Nike - none really have any secrets about what they do. There is nothing stopping someone from copying them - except that businesses are big, unwieldy things. When software becomes cheap companies compete on their support. We see this for Open Source software now. | | |
| ▲ | gf000 2 days ago | parent [-] | | These are businesses with extra-large capital requirements. You ain't replicating them, because you don't have the money, and they can easily strangle you with their money as you start out. Software is different, you need very very little to start, historically just your own skills and time. Thes latter two may see some changes with LLMs. | | |
| ▲ | TeodorDyakov 2 days ago | parent | next [-] | | How conveniently you forgot about the most impotant things for a product to make money - marketing and the network effect.... | | |
| ▲ | gf000 2 days ago | parent [-] | | I don't see the relevance to the discussion. Marketing is not significantly different for a shop and a online-only business. Having to buy a large property, fulfilling every law, etc is materially different than buying a laptop and renting a cloud instance. Almost everyone has the material capacity to do the latter, but almost no one has the privilege for the former. |
| |
| ▲ | nl 21 hours ago | parent | prev [-] | | This is exactly my point. |
|
| |
| ▲ | whatever1 2 days ago | parent | prev [-] | | The business is identifying the correct specs and filter the customer needs/requests so that the product does not become irrelevant. | | |
| ▲ | ehnto 2 days ago | parent | next [-] | | Okay, we will copy that version of the product too. There is more to it than the code and software provided in most cases I feel. | |
| ▲ | majormajor 2 days ago | parent | prev [-] | | I think `andrekandre is right in this hypothetical. Who'd pay for brand new Photoshop with a couple new features and improvements if LLM-cloned Photoshop-from-three-months-ago is free? The first few iterations of this cloud be massively consumer friendly for anything without serious cloud infra costs. Cheap clones all around. Like generic drugs but without the cartel-like control of manufacturing. Business after that would be dramatically different, though. Differentiating yourself from the willing-to-do-it-for-near-zero-margin competitors to produce something new to bring in money starts to get very hard. Can you provide better customer support? That could be hard, everyone's gonna have a pretty high baseline LLM-support-agent already... and hiring real people instead could dramatically increase the price difference you're trying to justify... Similarly for marketing or outreach etc; how are you going to cut through the AI-agent-generated copycat spam that's gonna be pounding everyone when everyone and their dog has a clone of popular software and services? Photoshop type things are probably a really good candidate for disruption like that because to a large extent every feature is independent. The noise reduction tool doesn't need API or SDK deps on the layer-opacity tool, for instance. If all your features are LLM balls of shit that doesn't necessarily reduce your ability to add new ones next to them, unlike in a more relational-database-based web app with cross-table/model dependencies, etc. And in this "try out any new idea cheaply and throw crap against the wall and see what sticks" world "product managers" and "idea people" etc are all pretty fucked. Some of the infinite monkeys are going to periodically hit to gain temporary advantage, but good luck finding someone to pay you to be a "product visionary" in a world where any feature can be rolled out and tested in the market by a random dev in hours or days. | | |
| ▲ | fragmede 2 days ago | parent [-] | | OK, so what do people do? What do people need? People still need to eat, people get married and die, and all of the things surrounding that, all sorts of health related stuff. Nightlife events. Insurance. actuaries. Raising babies. What do you spend your fun money on? People pay for things they use. If bespoke software is a thing you pick up at the mall at a kiosk next to Target we gotta figure something out. |
|
|
| |
| ▲ | PunchyHamster 2 days ago | parent | prev [-] | | It's all fine till money starts being involved and whoopsies cost more than few hours of fixing. |
|
| |
| ▲ | sksishbs 2 days ago | parent | prev [-] | | [dead] |
|
|
| ▲ | coldtea 2 days ago | parent | prev | next [-] |
| >What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects They get those ocassionally all the time though too. Depends on the company. In some software houses it's constant "greenfield projects", one after another. And even in companies with 1-2 pieces of main established software to maintain, there are all kinds of smaller utilities or pipelines needed. >But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". In some cases that's legit. In other cases it's just "it did it well, but not how I'd done it", which is often needless stickness to some particular style (often a contention between 2 human programmers too). Basically, what FloorEgg says in this thread: "There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things." And you can always not just tell it "build me this feature", but tell it (high level way) how to do it, and give it a generic context about such preferences too. |
|
| ▲ | coryrc 2 days ago | parent | prev | next [-] |
| > its building it the right way, in an easily understood way, in a way that's easily extensible. When I worked at Google, people rarely got promoted for doing that. They got promoted for delivering features or sometimes from rescuing a failing project because everyone was doing the former until promotion velocity dropped and your good people left to other projects not yet bogged down too far. |
|
| ▲ | lallysingh 2 days ago | parent | prev | next [-] |
| Yeah. Just like another engineer. When you tell another engineer to build you a feature, it's improbable they'll do it they way that you consider "right." This sounds a lot like the old arguments around using compilers vs hand-writing asm. But now you can tell the LLM how you want to implement the changes you want. This will become more and more relevant as we try to maintain the code it generates. But, for right now, another thing Claude's great at is answering questions about the codebase. It'll do the analysis and bring up reports for you. You can use that information to guide the instructions for changes, or just to help you be more productive. |
|
| ▲ | patates 2 days ago | parent | prev | next [-] |
| You can look at my comment history to see the evidence to how hostile I was to agentic coding. Opus 4.5 completely changed my opinion. This thing jumped into a giant JSF (yes, JSF) codebase and started fixing things with nearly zero guidance. |
|
| ▲ | EthanHeilman 2 days ago | parent | prev | next [-] |
| Even if you are going green field, you need to build it the way it is likely to be used based a having a deep familiarity with what that customer's problems are and how their current workflow is done. As much as we imagine everything is on the internet, a bunch of this stuff is not documented anywhere. An LLM could ask the customer requirement questions but that familiarity is often needed to know the right questions to ask. It is hard to bootstrap. Even if it could build the perfect greenfield app, as it updates the app it is needs to consider backwards compatibility and breaking changes. LLMs seem very far as growing apps. I think this is because LLMs are trained on the final outcome of the engineering process, but not on the incremental sub-commit work of first getting a faked out outline of the code running and then slowly building up that code until you have something that works. This isn't to say that LLMs or other AI approaches couldn't replace software engineering some day, but they clear aren't good enough yet and the training sets they have currently have access to are unlikely to provide the needed examples. |
|
| ▲ | qwm 2 days ago | parent | prev | next [-] |
| My favorite benchmark for LLMs and agents is to have it port a medium-complexity library to another programming language. If it can do that well, it's pretty capable of doing real tasks. So far, I always have to spend a lot of time fixing errors. There are also often deep issues that aren't obvious until you start using it. |
| |
| ▲ | Rastonbury 2 days ago | parent [-] | | Comments on here often criticise ports as easy for LLMs to do because there's a lot of training and tests are all there, which is not as complex as real word tasks |
|
|
| ▲ | ivanech 2 days ago | parent | prev | next [-] |
| I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though. |
| |
| ▲ | falkensmaize 2 days ago | parent | next [-] | | I don’t know what I’m doing wrong. Today I tried to get it to upgrade Nx, yarn and some resolutions in a typescript monorepo with about 20 apps at work (Opus 4.5 through Kiro) and it just…couldn’t do it. It hit some snags with some of the configuration changes required by the upgrade and resorted to trying to make unwanted changes to get it to build correctly. I would have thought that’s something it could hit out of the park. I finally gave up and just looked at the docs and some stack overflow and fixed it myself. I had to correct it a few times about correct config params too. It kept imagining config options that weren’t valid. | |
| ▲ | tac19 2 days ago | parent | prev [-] | | > ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though. People keep telling me that an LLM is not intelligence, it's simply spitting out statistically relevant tokens. But surely it takes intelligence to understand (and actually execute!) the request to "read adjacent code". | | |
| ▲ | latentsea 2 days ago | parent [-] | | I used to agree with this stance, but lately I'm more in the "LLMs are just fancy autocomplete" camp. They can just autocomplete increasingly more things, and when they can't, they fail in ways that an intelligent being just wouldn't. Rather that just output a wrong or useless autocompletion. | | |
| ▲ | tac19 2 days ago | parent | next [-] | | They're not an equivalent intelligence as human's and thus have noticeably different failure modes. But human's fail in ways that they don't (eg. being unable to match llm's breadth and depth of knowledge) But the question i'm really asking is... isn't it more than a sheer statistical "trick" if an LLM can actually be instructed to "read surrounding code", understand the request, and demonstrably include it in its operation? You can't do that unless you actually understand what "surrounding code" is, and more importantly have a way to comply with the request... | |
| ▲ | baq 2 days ago | parent | prev [-] | | In a sense humans are fancy autocomplete, too. | | |
| ▲ | latentsea 2 days ago | parent | next [-] | | I actually don't disagree with this sentiment. The difference is we've optimised for autocompleting our way out of situations we currently don't have enough information to solve, and LLMs have gone the opposite direction of over-indexing on too much "autocomplete the thing based on current knowledge". At this point I don't doubt that whatever human intelligence is, it's a computable function. | |
| ▲ | suddenlybananas 2 days ago | parent | prev [-] | | You know that language had to emerge at some point? LLMs can only do anything because they have been fed on human data. Humans actually had to collectively come up with languages /without/ anything to copy since there was a time before language. |
|
|
|
|
|
| ▲ | miki123211 2 days ago | parent | prev | next [-] |
| In my personal experience, Claude is better at greenfield, Codex is better at fitting in. Claude is the perfect tool for a "vibe coder", Codex is for the serious engineer who wants to get great and real work done. Codex will regularly give me 1000+ line diffs where all my comments (I review every single line of what agents write) are basically nitpicks. "Make this shallow w/ early return, use | None instead of Optional", that sort of thing. I do prompt it in detail though. It feels like I'm the person coming in with the architecture most of the time, AI "draws the rest of the owl." |
|
| ▲ | colechristensen 2 days ago | parent | prev | next [-] |
| >day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right" Then don't ask it to "build me this feature" instead lay out a software development process with designated human in the loop where you want it and guard rails to keep it on track. Create a code review agent to look for and reject strange abstractions. Tell it what you don't like and it's really good at finding it. I find Opus 4.5, properly prompted, to be significantly better at reviewing code than writing it, but you can just put it in a loop until the code it writes matches the review. |
|
| ▲ | Madmallard 2 days ago | parent | prev | next [-] |
| Based on my experience using these LLMs regularly I strongly doubt it could even build any application with realistic complexity without screwing things up in major ways everywhere, and even on top of that still not meeting all the requirements. |
|
| ▲ | Balinares 2 days ago | parent | prev | next [-] |
| Exactly. The main issue IMO is that "software that seems to work" and "software that works" can be very hard to tell apart without validating the code, yet these are drastically different in terms of long-term outcomes. Especially when there's a lot of money, or even lives, riding on these outcomes. Just because LLMs can write software to run the Therac-25 doesn't mean it's acceptable for them to do so. Your hobby project, though, knock yourself out. |
|
| ▲ | nialse a day ago | parent | prev | next [-] |
| After recently applying Codex to a gigantic old and hairy project that is as far from greenfield it can be, I can assure you this assertion is false. It’s bonkers seeing 5.2 churn though the complexity and understanding dependencies that would take me days or weeks to wrap my head around. |
|
| ▲ | KentLatricia 2 days ago | parent | prev | next [-] |
| Another thing these posts assume is a single developer keep working on the product with a number of AI agents, not a large team. I think we need to rethink how teams work with AI. Its probably not gonna be a single developer typing a prompt but a team somehow collaborates a prompt or equivalent. XP on steroids? Programming by committee? |
|
| ▲ | avereveard 2 days ago | parent | prev | next [-] |
| But... you can ask! Ask claude to use encapsulation, or to write the equivalent of interfaces in the language you using, and to map out dependencies and duplicate features, or to maintain a dictionary of component responsibilities. AI coding is a multiplier of writing speed but doesn't excuse planning out and mapping out features. You can have reasonably engineered code if you get models to stick to well designed modules but you need to tell them. |
| |
| ▲ | verall 2 days ago | parent [-] | | But time I spend asking is time I could have been writing exactly what I wanted in the first place, if I already did the planning to understand what I wanted. Once I know what I want, it doesn't take that long, usually. Which is why it's so great for prototyping, because it can create something during the planning, when you haven't planned out quite what you want yet. |
|
|
| ▲ | AndrewKemendo 2 days ago | parent | prev | next [-] |
| > The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible. The number of production applications that achieve this rounds to zero I’ve probably managed 300 brownfield web, mobile, edge, datacenter, data processing and ML applications/products across DoD, B2B, consumer and literally zero of them were built in this way |
| |
| ▲ | kaashif 2 days ago | parent [-] | | I think there is a subjective difference. When a human builds dogshit at least you know they put some effort and the hours in. When I'm reading piles of LLM slop, I know that just reading it is already more effort than it took to write. It feels like I'm being played. This is entirely subjective and emotional. But when someone writes something with an LLM in 5 seconds and asks me to spend hours reviewing...fuck off. | | |
| ▲ | parpfish 2 days ago | parent | next [-] | | If you are heavily using LLMs, you need to change the way you think about reviews I think most people now approach it as:
Dev0 uses an LLM to build a feature super fast, Dev1 spends time doing a in depth review. Dev0 built it, Dev1 reviewed it. And Dev0 is happy because they used the tool to save time! But what should happen is that Dev0 should take all that time they saved coding and reallocate it to the in depth review. The LLM wrote it, Dev0 reviewed it, Dev1 double-reviewed it. Time savings are much less, but there’s less context switching between being a coder and a reviewer. We are all reviewers now all the time | | |
| ▲ | PunchyHamster 2 days ago | parent [-] | | Can't do that, else KPIs won't show that AI tools reduced amount of coding work by xx% |
| |
| ▲ | AndrewKemendo 2 days ago | parent | prev [-] | | Your comment doesn’t address what I said and instead finds a new reason that it’s invalid because “reviewing code from a machine system is beneath me” Get over yourself |
|
|
|
| ▲ | noodletheworld 2 days ago | parent | prev | next [-] |
| It might scale. So far, Im not convinced, but lets take a look at fundmentally whats happening and why humans > agents > LLMs. At its heart, programming is a constraint satisfaction problem. The more constraints (requirements, syntax, standards, etc) you have, the harder it is to solve them all simultaneously. New projects with few contributors have fewer constraints. The process of “any change” is therefore simpler. Now, undeniably 1) agents have improved the ability to solve constraints by iterating; eg. Generate, test, modify, etc. over raw LLm output. 2) There is an upper bound (context size, model capability) to solve simultaneous constraints. 3) Most people have a better ability to do this than agents (including claude code using opus 4.5). So, if youre seeing good results from agents, you probably have a smaller set of constraints than other people. Similarly, if youre getting bad results, you can probably improve them by relaxing some of the constraints (consistent ui, number of contributors, requirements, standards, security requirements, split code into well defined packages). This will make both agents and humans more productive. The open question is: will models continue to improve enough to approach or exceed human level ability in this? Are humans willing to relax the constraints enough for it to be plausible? I would say currently people clambering about the end of human developers are cluelessly deceived by the “appearance of complexity” which does not match the “reality of constraints” in larger applications. Opus 4.5 cannot do the work of a human on code bases Ive worked on. Hell, talented humans struggle to work on some of them. …but that doesnt mean it doesnt work. Just that, right now, the constraint set it can solve is not large enough to be useful in those situations. …and increasingly we see low quality software where people care only about speed of delivery; again, lowering the bar in terms of requirements. So… you know. Watch this space. Im not counting on having a dev job in 10 years. If I do, it might be making a pile of barely working garbage. …but I have one now, and anyone who thinks that this year people will be largely replaced by AI is probably poorly informed and has misunderstood the capabilities on these models. Theres only so low you can go in terms of quality. |
|
| ▲ | herpdyderp 2 days ago | parent | prev | next [-] |
| On the contrary, Opus 4.5 is the best agent I’ve ever used for making cohesive changes across many files in a large, existing codebase. It maintains our patterns and looks like all the other code. Sometimes it hiccups for sure. |
|
| ▲ | Havoc 2 days ago | parent | prev | next [-] |
| > greenfield LLMs are pretty good at picking up existing codebases. Even with cleared context they can do „look at this codebase and this spec doc that created it. I want to add feature x“ |
| |
| ▲ | le-mark 2 days ago | parent [-] | | What size of code base are you talking about? And this is your personal experience? | | |
| ▲ | Havoc 2 days ago | parent [-] | | Overall Codebase size vs context matter less when you set it up as microservices style architecture from the starts. I just split it into boundaries that make sense to me. Get the LLM to make a quick cheat sheet about the api and then feed that into adjacent modules. It doesn’t need to know everything about all of it to make changes if you’ve got a grip on big picture and the boundaries are somewhat sane | | |
| ▲ | onion2k 2 days ago | parent | next [-] | | Overall Codebase size vs context matter less when you set it up as microservices style architecture from the starts. It'll be fun if the primary benefit of microservices turns out to be that LLMs can understand the codebase. | | |
| ▲ | baq 2 days ago | parent [-] | | That was the whole point for humans, too. | | |
| ▲ | gf000 2 days ago | parent [-] | | Except it doesn't work the same way it won't work for LLMs. If you use too many microserviced, you will get global state, race conditions, much more complex failure models again and no human/LLM can effectively reason about those. We somewhat have tools to do that in case of monoliths, but if one gets to this point with microservices, it's game over. |
|
| |
| ▲ | magicalist 2 days ago | parent | prev | next [-] | | So "pretty good at picking up existing codebases" so long as the existing codebase is all microservices. | | |
| ▲ | heartbreak 2 days ago | parent | next [-] | | Or a Rails app. | |
| ▲ | enraged_camel 2 days ago | parent | prev [-] | | I work with multiple monoliths that span anywhere from 100k to 500k lines of code, in a non-mainstream language (Elixir). Opus 4.5 crushes everything I throw at it: complex bugs, extending existing features, adding new features in a way that matches conventions, refactors, migrations... The only time it struggles is if my instructions are unclear or incomplete. For example if I ask it to fix a bug but don't specify that such-and-such should continue to work the way it does due to an undocumented business requirement, Opus might mess that up. But I consider that normal because a human developer would also do fail at it. | | |
| ▲ | aprilthird2021 2 days ago | parent [-] | | With all due respect those are very small codebases compared to the kinds of things a lot of software engineers work on. |
|
| |
| ▲ | phito 2 days ago | parent | prev [-] | | It doesn't have to be micro services, just code that is decoupled properly, so it can search and build its context easily. |
|
|
|
|
| ▲ | volkanvardar 2 days ago | parent | prev | next [-] |
| I totally agree. And welcome to disposable software age. |
|
| ▲ | epolanski 2 days ago | parent | prev | next [-] |
| Yeah, all of those applications he shows do not really expose any complex business logic. With all the due respect: a file converter for windows is glueing few windows APIs with the relevant codec. Now, good luck working on a complex warehouse management application where you need extremely complex logic to sort the order of picking, assembling, packing on an infinite number of variables: weight, amazon prime priority, distribution centers, number and type of carts available, number and type of assembly stations available, different delivery systems and requirements for different delivery operators (such as GLE, DHL, etc) that has to work with N customers all requiring slightly different capabilities and flows, all having different printers and operations, etc, etc. And I ain't even scratching the surface of the business logic complexity (not even mentioning functional requirements) to avoid boring the reader. Mind you, AI is still tremendously useful in the analysis phase, and can sort of help in some steps of the implementation one, but the number of times you can avoid looking thoroughly at the code for any minor issue or discrepancy is absolutely close to 0. |
|
| ▲ | fooker 2 days ago | parent | prev | next [-] |
| It just one shots bug fixes in complex codebases. Copy-paste the bug report and watch it go. |
|
| ▲ | wilg 2 days ago | parent | prev | next [-] |
| you can definitely just tell it what abstractions you want when adding a feature and do incremental work on existing codebase. but i generally prefer gpt-5.2 |
| |
| ▲ | boppo1 2 days ago | parent [-] | | I've been using 5.2 a lot lately but hit my quota for the first time (and will probably continue to hit it most weeks) so I shelled out for claude code. What differences do you notice? Any 'metagame' that would be helpful? | | |
| ▲ | wilg 2 days ago | parent [-] | | I just use Cursor because I can pick any mode. The difference is hard to say exactly, Opus seems good but 5.2 seems smarter on the tasks I tried. Or possibly I just "trust" it more. I tend to use high or extra high reasoning. |
|
|
|
| ▲ | scotty79 2 days ago | parent | prev | next [-] |
| If you have microservices architecture in your project you are set for AI. You can swap out any lacking, legacy microservice in your system with "greenfield" vibecoded one. |
|
| ▲ | kevinsync 2 days ago | parent | prev | next [-] |
| Man, I've been biting my tongue all day with regards to this thread and overall discussion. I've been building a somewhat-novel, complex, greenfield desktop app for 6 months now, conceived and architected by a human (me), visually designed by a human (me), implementation heavily leaning on mostly Claude Code but with Codex and Gemini thrown in the mix for the grunt work. I have decades of experience, could have built it bespoke in like 1-2 years probably, but I wanted a real project to kick the tires on "the future of our profession". TL;DR I started with 100% vibe code simply to test the limits of what was being promised. It was a functional toy that had a lot of problems. I started over and tried a CLI version. It needed a therapist. I started over and went back to visual UI. It worked but was too constrained. I started over again. After about 10 complete start-overs in blank folders, I had a better vision of what I wanted to make, and how to achieve it. Since then, I've been working day after day, screen after screen, building, refactoring, going feature by feature, bug after bug, exactly how I would if I was coding manually. Many times I've reached a point where it feels "feature complete", until I throw a bigger dataset at it, which brings it to its knees. Time to re-architect, re-think memory and storage and algorithms and libraries used. Code bloated, and I put it on a diet until it was trim and svelte. I've tried many different approaches to hard problems, some of which LLMs would suggest that truly surprised me in their efficacy, but only after I presented the issues with the previous implementation. There's a lot of conversation and back and forth with the machine, but we always end up getting there in the end. Opus 4.5 has been significantly better than previous Anthropic models. As I hit milestones, I manually audit code, rewrite things, reformat things, generally polish the turd. I tell this story only because I'm 95% there to a real, legitimate product, with 90% of the way to go still. It's been half a year. Vibe coding a simple app that you just want to use personally is cool; let the machine do it all, don't worry about under the hood, and I think a lot of people will be doing that kind of stuff more and more because it's so empowering and immediate. Using these tools is also neat and amazing because they're a force multiplier for a single person or small group who really understand what needs done and what decisions need made. These tools can build very complex, maintainable software if you can walk with them step by step and articulate the guidelines and guardrails, testing every feature, pushing back when it gets it wrong, growing with the codebase, getting in there manually whenever and wherever needed. These tools CANNOT one-shot truly new stuff, but they can be slowly cajoled and massaged into eventually getting you to where you want to go; like, hard things are hard, and things that take time don't get done for a while. I have no moral compunctions or philosophical musings about utilizing these tools, but IMO there's still significant effort and coordination needed to make something really great using them (and literally minimal effort and no coordination needed to make something passable) If you're solo, know what you want, and know what you're doing, I believe you might see 2x, 4x gains in time and efficiency using Claude Code and all of his magical agents, but if your project is more than a toy, I would bet that 2x or 4x is applied to a temporal period of years, not days or months! |
|
| ▲ | blitz_skull 2 days ago | parent | prev | next [-] |
| This is the exact copium I came here to enjoy. |
|
| ▲ | llm_nerd 2 days ago | parent | prev [-] |
| "its building it the right way, in an easily understood way, in a way that's easily extensible" I am in a unique situation where I work with a variety of codebases over the week. I have had no problem at all utilizing Claude Code w/ Opus 4.5 and Gemini CLI w/ Gemini 3.0 Pro to make excellent code that is indisputably "the right way", in an extremely clear and understandable way, and that is maximally extensible. None of them are greenfield projects. I feel like this is a bit of je ne sais quoi where people appeal to some indemonstrable essence that these tools just can't accomplish, and only the "non-technical" people are foolish enough to not realize it. I'm a pretty technical person (about 30 years of software development, up to staff engineer and then VP). I think they have reached a pretty high level of competence. I still audit the code and monitor their creations, but I don't think they're the oft claimed "junior developer" replacement, but instead do the work I would have gotten from a very experienced, expert-level developer, but instead of being an expert at a niche, they're experts at almost every niche. Are they perfect? Far from it. It still requires a practitioner who knows what they're doing. But frequently on here I see people giving takes that sound like they last used some early variant of Copilot or something and think that remains state of the art. The rest of us are just accelerating our lives with these tools, knowing that pretending they suck online won't slow their ascent an iota. |
| |
| ▲ | what 2 days ago | parent | next [-] | | >llm_nerd
>created two years ago You AI hype thots/bots are all the same. All these claims but never backed up with anything to look at. And also alway claiming “you’re holding it wrong”. | | |
| ▲ | pigpop 2 days ago | parent | next [-] | | I don't see how "two years ago" is incongruous with having been using LLMs for coding, it's exactly the timeline I would expect. Yes, some people do just post "git gud" but there are many people ITT and most of the others on LLM coding articles who are trying to explain their process to anyone who will listen. I'm not sure if it is fully explainable in a single comment though, I'd have to write a multi-part tutorial to cover everything but it's almost entirely just applying the same project management principles that you would in a larger team of developers but customized to the current limitations of LLMs. If you want full tutorials with examples I'm sure they're out there but I'd also just recommend reviewing some project management material and then seeing how you can apply it to a coding agent. You'll only really learn by doing. | |
| ▲ | llm_nerd 2 days ago | parent | prev [-] | | >You AI hype thots/bots are all the same This isn't twitter, so save the garbage rhetoric. And if you must question my account, I create a new account whenever I setup a new main PC, and randomly pick a username that is top of mind at the moment. This isn't professionally or personally affiliated in any way so I'm not trying to build a thing. I mean, if I had a 10 year old account that only managed a few hundred upvotes despite prolific commenting, I'd probably delete it out of embarrassment though. >All these claims but never backed up with anything to look at Uh...install the tools? Use them? What does "to look at" even mean? Loads of people are using these tools to great effect, while some tiny minority tell us online that no way they don't work, etc. And at some point they'll pull their head out of the sand and write the followup "Wait, they actually do". |
| |
| ▲ | doxeddaily 2 days ago | parent | prev [-] | | I also have >30 years and I've had the same experience. I noticed an immediate improvement with 4.5 and I've been getting great results in general. And yes I do make sure it's not generating crazy architecture. It might do that.. if you let it. So don't let it. | | |
| ▲ | llm_nerd 2 days ago | parent [-] | | HN has a subset of users -- they're a minority, but they hit threads like this super hard -- who really, truly think that if they say that AI tools suck and are only for nubs loud enough and frequently enough, downvoting anyone who finds them useful, all AI advancements will unwind and it'll be the "good old days" again. It's rather bizarre stuff, but that's what happens when people in denial feel threatened. |
|
|