| ▲ | redhale 2 days ago |
| Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds. Starting back in 2022/2023: - (~2022) It can auto-complete one line, but it can't write a full function. - (~2023) Ok, it can write a full function, but it can't write a full feature. - (~2024) Ok, it can write a full feature, but it can't write a simple application. - (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product. - (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term. It's pretty clear to me where this is going. The only question is how long it takes to get there. |
|
| ▲ | fernandezpablo 8 minutes ago | parent | next [-] |
| Starting back in 2022/2023: - (~2022) "It's so over for developers". 2022 ends with more professional developers than 2021. - (~2023) "Ok, now it's really over for developers". 2023 ends with more professional developers than 2022. - (~2024) "Ok, now it's really, really over for developers". 2024 ends with more professional developers than 2023. - (~2025) "Ok, now it's really, really, absolutely over for developers". 2025 ends with more professional developers than 2024. - (~2025+) etc. Sources: https://www.jetbrains.com/lp/devecosystem-data-playground/#g... |
|
| ▲ | arkensaw 2 days ago | parent | prev | next [-] |
| > It's pretty clear to me where this is going. The only question is how long it takes to get there. I don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason. |
| |
| ▲ | namrog84 2 days ago | parent [-] | | While I do agree with you. To play the counterpoint advocate though. What if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase? It is probably incredibly wasteful, but ignoring that, could it work? | | |
| ▲ | fwip 2 days ago | parent | next [-] | | That sounds like an insane way to do anything that matters. Sure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email? The only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters. | | |
| ▲ | theshrike79 2 days ago | parent [-] | | In Japan buildings (apartments) aren't built to last forever. They are built with a specific age in mind. They acknowledge the fact that houses are depreciating assets which have a value lim->0. The only reason we don't do that with code (or didn't use to do it) was because rewriting from scratch NEVER worked[0]. And large scale refactors take massive amounts of time and resources, so much so that there are whole books written about how to do it. But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec". [0] https://www.joelonsoftware.com/2000/04/06/things-you-should-... [1] https://simonwillison.net/2025/Dec/15/porting-justhtml/ | | |
| ▲ | techblueberry a day ago | parent | next [-] | | > But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec". This seems like a sort of I dunno chicken and the egg thing. The _reason_ you don't rewrite code is because it's hard to know that you truly understand the spec. If you could perfectly understand the spec then you could rewrite the code, but then what is the software, is it the code or the spec that writes the code. So if you built code A from spec, rebuilding it from spec I don't think qualifies a rewrite, it's just a recompile. If you're trying to fundamentally build a new application from spec when an old application was written by hand, you're going to run into the same problems you have in a normal rewrite. We already have an example of this. Typescript applications are basically rewritten every time that you recompile typescript to node. Typescript isn't the executed code, it's a spec. edit: I think I missed that you said rewrite in a different language, then yeah fine, you're probably right, but I don't think most people are architecture agnostic when they talk about rewrites. The point of a rewrite is to keep the good stuff and lose a lot of bad stuff. If you're using the original app as a spec to rewrite in a new language, then fine yeah, LLM's may be able to do this relatively trivially. | |
| ▲ | fwip 2 days ago | parent | prev [-] | | Sure, and the buildings are built to a slowly-evolving code, using standard construction techniques, operating as a predictable building in a larger ecosystem. The problem with "all software" being AI-generated is that, to use your analogy, the electrical standards, foundation, and building materials have all been recently vibe-coded into existence, and none of your construction workers are certified in any of it. |
|
| |
| ▲ | techblueberry a day ago | parent | prev | next [-] | | I don't think so. I don't think this is how human brains work, and you would have too many problems trying to balance things out. I'm thinking specifically like a complex distributed system. There are a lot of tweaks and iterations you need for things to work with eachother. But then maybe this means what is a "codebase". If a code base is just a structured set of specs that compile to code ala typescript -> javascript. sure, but then, it's still a long-lived <blank> But maybe you would have to elaborate on, what does "creating software on the fly" look like,. because I'm sure there's a definition where the answer is yes. | |
| ▲ | damethos 2 days ago | parent | prev [-] | | I have the same questions in my head lately. |
|
|
|
| ▲ | bayindirh 2 days ago | parent | prev | next [-] |
| Well, the first 90% is easy, the hard part is the second 90%. Case in point: Self driving cars. Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders. |
| |
| ▲ | throwthrowuknow 2 days ago | parent | next [-] | | Even if Opus 4.5 is the limit it’s still a massively useful tool. I don’t believe it’s the limit though for the simple fact that a lot could be done by creating more specialized models for each subdomain i.e. they’ve focused mostly on web based development but could do the same for any other paradigm. | | |
| ▲ | emodendroket 2 days ago | parent | next [-] | | That's a massive shift in the claim though... I don't think anyone is disputing that it's a useful tool; just the implication that because it's a useful tool and has seen rapid improvement that implies they're going to "get all the way there," so to speak. | |
| ▲ | bayindirh 2 days ago | parent | prev [-] | | Personally I'm not against LLMs or AI itself, but considering how these models are built and trained, I personally refuse to use tools built on others' work without or against their consent (esp. GPL/LGPL/AGPL, Non Commercial / No Derivatives CC licenses and Source Available licenses). Of course the tech will be useful and ethical if these problems are solved or decided to be solved the right way. | | |
| ▲ | ForHackernews 2 days ago | parent [-] | | We just need to tax the hell out of the AI companies (assuming they are ever profitable) since all their gains are built on plundering the collective wisdom of humanity. | | |
| ▲ | thfuran 2 days ago | parent | next [-] | | I don’t think waiting for profitability makes sense. They can be massively disruptive without much profit as long as they spend enough money. | |
| ▲ | encyclopedism 2 days ago | parent | prev [-] | | AI companies and corporations in general control your politicians so taxing isn't going to happen. |
|
|
| |
| ▲ | literalAardvark 2 days ago | parent | prev | next [-] | | They're not blenders. This is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it. You just lose knowledge, which can be provided externally, and which is the actual "pirated" part. The logic is _learned_ | | |
| ▲ | encyclopedism 2 days ago | parent | next [-] | | It hasn't learned any LOGIC. It has 'learned' patterns from the input. | | | |
| ▲ | bayindirh 2 days ago | parent | prev [-] | | Are there any recent publications about it so I can refresh myself on the matter? | | |
| ▲ | D-Machine 2 days ago | parent [-] | | You won't find any trustworthy papers on the topic because GP is simply wrong here. That models can be distilled has no bearing whatsoever on whether a model has learned actual knowledge or understanding ("logic"). Models have always learned sparse/approximately-sparse and/or redundant weights, but they are still all doing manifold-fitting. The resulting embeddings from such fitting reflect semantics and semantic patterns. For LLMs trained on the internet, the semantic patterns learned are linguistic, which are not just strictly logical, but also reflect emotional, connotational, conventional, and frequent patterns, all of which can be illogical or just wrong. While linguistic semantic patterns are correlated with logical patterns in some cases, this is simply not true in general. |
|
| |
| ▲ | 2 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | mcfedr 2 days ago | parent | prev | next [-] | | i like to think of LLMs as random number generators with a filter | |
| ▲ | rat9988 2 days ago | parent | prev [-] | | > Well, the first 90% is easy, the hard part is the second 90%. You'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth. | | |
| ▲ | bayindirh 2 days ago | parent | next [-] | | No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline. Knowledge engineering has a notion called "covered/invisible knowledge" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good. Considering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person. | | |
| ▲ | enraged_camel 2 days ago | parent | next [-] | | >> No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid. >> The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline. Sure, but the question is not "how long does it take for LLMs to get to 100%". The question is, how long does it take for them to become as good as, or better than, humans. And that threshold happens way before 100%. | | |
| ▲ | bayindirh 2 days ago | parent [-] | | >> Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid. Doesn't matter, because if we're talking about AI models, no (type of) model reaches 100% linearly, or 100% ever. For example, recognition models run with probabilities. Like Tesla's Autopilot (TM), which loves to hit rolled-over vehicles because it has not seen enough vehicle underbodies to classify it. Same for scientific classification models. They emit probabilities, not certain results. >> Sure, but the question is not "how long does it take for LLMs to get to 100%" I never claimed that a model needs to reach a proverbial 100%. >> The question is, how long does it take for them to become as good as, or better than, humans. They can be better than humans for certain tasks. They are actually better than humans in some tasks since 70s, but we like to disregard them to romanticize current improvements, but I don't believe current or any generation of AIs can be better than humans in anything and everything, at once. Remember: No machine can construct something more complex than itself. >> And that threshold happens way before 100%. Yes, and I consider that "treshold" as "complete", if they can ever reach it for certain tasks, not "any" task. |
| |
| ▲ | rat9988 2 days ago | parent | prev | next [-] | | Self driving cars is not a proof. It only proves that having quick gains doesn't mean necessarily you'll get a 100% fast. It doesn't prove it will necessarily happen. | |
| ▲ | damethos 2 days ago | parent | prev | next [-] | | "covered/invisible knowledge" aka tacit knowledge | | |
| ▲ | bayindirh 2 days ago | parent [-] | | Yeah, I failed to remember the term while writing the comment. Thanks! |
| |
| ▲ | thfuran 2 days ago | parent | prev [-] | | >None of the models (even AI in general) can capture this None of the current models maybe, but not AI in general? There’s nothing magical about brains. In fact, they’re pretty shit in many ways. | | |
| ▲ | bayindirh 2 days ago | parent | next [-] | | A model trained on a very large corpus can't, because these behaviors are different or specialized enough they cancel each other most of the cases. You can forcefully fine-tune a model with a singular person's behavior up to a certain point, but I'm not sure that even that can capture the subtlest of behaviors or decision mechanisms which are generally the most important ones (the ones we call gut feeling or instinct). OTOH, while I won't call human brain perfect, the things we label "shit" generally turn out to be very clever and useful optimizations to workaround its own limitations, so I regard human brain higher than most AI proponents do. Also we shouldn't forget that we don't know much about how that thing works. We only guess and try to model it. Lastly, searching perfection in numbers and charts or in engineering sense is misunderstanding nature and doing a great disservice to it, but this is a subject for another day. | |
| ▲ | emodendroket 2 days ago | parent | prev [-] | | The understanding of the brain is far from complete whether they're "magical" or "shit." | | |
|
| |
| ▲ | sanderjd 2 days ago | parent | prev | next [-] | | I read the comment more as "based on past experience, it is usually the case that the first 90% is easier than the last 10%", which is the right base case expectation, I think. That doesn't mean it will definitely play out that way, but you don't have to "prove" things like this. You can just say that they tend to be true, so it's a good expectation to think it will probably be true again. | |
| ▲ | rybosworld 2 days ago | parent | prev [-] | | The saying is more or less treated as a truism at this point. OP isn't claiming something original and the onus of proving it isn't on them imo. I've heard this same thing repeated dozens of times, and for different domains/industries. It's really just a variation of the 80/20 rule. |
|
|
|
| ▲ | PunchyHamster 2 days ago | parent | prev | next [-] |
| Note that blog posts rarely show the 20 other times it failed to build something and only that time that it happened to work. We've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years |
| |
| ▲ | redhale 2 days ago | parent | next [-] | | I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation). As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better. And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around. | |
| ▲ | theshrike79 2 days ago | parent | prev [-] | | With "art" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM. Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really. If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway. It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it. |
|
|
| ▲ | Scea91 2 days ago | parent | prev | next [-] |
| > - (~2023) Ok, it can write a full function, but it can't write a full feature. The trend is definitely here, but even today, heavily depends on the feature. While extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product. |
|
| ▲ | sanderjd 2 days ago | parent | prev | next [-] |
| Yeah maybe, but personally it feels more like a plateau to me than an exponential takeoff, at the moment. And this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time! But I'm still pretty skeptical of "these things are about to not require human operators in the loop at all!". |
| |
| ▲ | throwthrowuknow 2 days ago | parent [-] | | I can agree that it doesn’t seem exponential yet but this is at least linear progression not a plateau. | | |
| ▲ | sanderjd 2 days ago | parent [-] | | Linear progression feels slower (and thus more like a plateau) to me than the end of 2022 through end of 2024 period. The question in my mind is where we are on the s-curve. Are we just now entering hyper-growth? Or are we starting to level out toward maturity? It seems like it must still be hyper-growth, but it feels less that way to me than it did a year ago. I think in large part my sense is that there are two curves happening simultaneously, but at different rates. There is the growth in capabilities, and then there is the growth in adoption. I think it's the first curve that seems to be to have slown a bit. Model improvements seem both amazing and also less revolutionary to me than they did a year or two ago. But the other curve is adoption, and I think that one is way further from maturity. The providers are focusing more on the tooling now that the models are good enough. I'm seeing "normies" (that is, non-programmers) starting to realize the power of Claude Code in their own workflows. I think that's gonna be huge and is just getting started. |
|
|
|
| ▲ | EthanHeilman 2 days ago | parent | prev | next [-] |
| I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet. > The only question is how long it takes to get there. This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations. It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom. |
|
| ▲ | kubb 2 days ago | parent | prev | next [-] |
| Each of these years we’ve had a claim that it’s about to replace all engineers. By your logic, does it mean that engineers will never get replaced? |
|
| ▲ | HarHarVeryFunny 2 days ago | parent | prev | next [-] |
| Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help. I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool. |
|
| ▲ | itsthecourier 2 days ago | parent | prev | next [-] |
| I use it on a 10 years codebase, needs to explain where to get context but successfully works 90% of time |
|
| ▲ | ugurs 2 days ago | parent | prev | next [-] |
| Ok, it can create a long-lived complex codebase for a product that is extensible and scalable over the long term, but it doesn't have cool tattoos and can't fancy a matcha |
|
| ▲ | mjr00 2 days ago | parent | prev [-] |
| This is disingenuous because LLMs were already writing full, simple applications in 2023.[0] They're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry. Plus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves. [0] eg https://www.youtube.com/watch?v=GizsSo-EevA |
| |
| ▲ | blitz_skull 2 days ago | parent [-] | | What LLM were you using to build full applications in 2023? That certainly wasn’t my experience. | | |
| ▲ | mjr00 2 days ago | parent [-] | | Just from googling, here's a video "Use ChatGPT to Code a Full Stack App" from May 18, 2023.[0] There's a lot of non-ergonomic copy and pasting but it's definitely using an LLM to build a full application. [0] https://www.youtube.com/watch?v=GizsSo-EevA | | |
| ▲ | blitz_skull 2 days ago | parent [-] | | That's not at all what's being discussed in this article. We copy-pasted from SO before this. This article is talking about 99% fully autonomous coding with agents, not copy-pasting 400 times from a chat bot. | | |
| ▲ | mjr00 2 days ago | parent [-] | | Hi, please re-read the parent comment again, which was claiming > Starting back in 2022/2023: > - (~2022) It can auto-complete one line, but it can't write a full function. > - (~2023) Ok, it can write a full function, but it can't write a full feature. This was a direct refutation, with evidence, that in 2023 people were not claiming that LLMs "can't write a full feature", because, as demonstrated, people were already building full applications with it at the time. This obviously is not talking exclusively about agents, because agents did not exist in 2022. | | |
| ▲ | redhale 2 days ago | parent [-] | | I get your point, but I'll just say that I did not intend my comment to be interpreted so literally. Also, just because SOMEONE planted a flag in 2023 saying that an LLM could build an app certainly does NOT mean that "people were not claiming that LLMs "can't write a full feature"". People in this very thread are still claiming LLMs can't write features. Opinions vary. |
|
|
|
|
|