| ▲ | snaking0776 7 hours ago |
| I know this is likely just for IPO hype but when I read things like this I sometimes wonder if I must be missing something. I use agents everyday and find them really useful and they save me a lot of headache. At the same time I find that if I let it self-direct at a high level at all it generally makes bad choices that cause me headaches later so I can’t really give them autonomy. Enough people seem to believe this exponential line of thinking though that I keep having to wonder: am I the one missing something here? Is there some magic tool that I haven’t found yet that will cure cancer? |
|
| ▲ | jonas21 6 hours ago | parent | next [-] |
| What did your AI-assisted workflow look like 1 year ago? I can only speak for myself, but I would carefully specify a class or module in great detail and then hand it off to the model to implement, then carefully review the result. How about 2 years ago? Back then, I wouldn't even trust it to write a 5-line function without making some sort of silly mistake. Today, I can leave an agent running by itself for 20 or 30 minutes and most of the time, it comes back with a result that's either flawless or can be refined to be good with a few back and forth messages. Maybe I still have to make some high-level decisions ahead of time, but all of the details, including exploring the codebase and figuring out what to do based on that, can be left to the agent. The amount of improvement just in the last 2 years has been staggering. Now extrapolate how things will look if the trend continues for another 2 or 3 years. Is this guaranteed to happen? No. But people have been predicting that we're going to hit a wall for a long time now, and we haven't yet. Maybe there's a wall just ahead of us. But maybe there's not -- and the "not" case seems likely enough that we should at least be planning for it. |
| |
| ▲ | techblueberry 13 minutes ago | parent | next [-] | | https://fortune.com/2026/01/29/100-percent-of-code-at-anthro... I feel like the problem is there aren’t any great metrics. Boris Cherny probably gets paid like $2 mil per year. So what does it mean that Claude writes 100% of his code? And Claude writes 100% of code for most teams? Has Anthropic started laying people off? If Claude is writing 100% of code doesn’t that mean game over? It’s both amazing and kind of a useless metric. How do I extrapolate out 100% 2-3 years from now? Super-duper 100%? Infinity infinity? | |
| ▲ | dontlikeyoueith 3 hours ago | parent | prev | next [-] | | I disagree with your assessment pretty strongly -- the models themselves hit a wall over a year ago once companies exhausted all existing training data. LLMs don't induce world models, and they aren't capable of real search an planning outside their training distributions. They, structurally, never will be. I haven't noticed a change in what I trust a model to generate in response to a single prompt in a year. The failure modes are unchanged. Yes, specific failures have improved as they have been documented and passed into model training data, but the way the models fail has not changed. They still fail for me nearly every single day. I'm a pretty heavy user - 3-4 Claude code processes running at a time, all day every day. What has gotten better is tooling around the model -- but there's no space for exponential growth there. At least, not without exponential cost increase, which would make the whole thing untenable anyway. | |
| ▲ | snaking0776 3 hours ago | parent | prev | next [-] | | I wonder if our difference in view could be an instance of the jagged nature of AI’s intelligence. I do computational research in a basic science so write code or build models basically all day that is (occasionally) novel. I would say that I’ve noticed exponential improvements in parts of my job but certainly not all. For example, if I’m trying to visualize a concept from a paper I now go straight to Codex, give it the paper, and describe a webapp which allows me to play with the model in a way that wasn’t possible one year ago (this is great for teaching btw). If I have a script that I want to generalize, add in better metrics, or setup for running on a cluster I use codex and it does great. Where it fails me though is exactly when I’m doing something novel like developing a new model or trying to develop some new method to process data. I’ve tried many times to one shot these ideas with detailed descriptions of what I want, how I’d like to generate abstractions, etc and it almost always ends up changing what I want to what I can only describe as something which better matches its training data. It often quietly changes key details that means that I have to delete the whole thing and start over. Just today this happened. On this level of task I’ve found that my workflow and pace of iteration hasn’t really changed at all in the last year. I still have to go and explain in detail on a function by function level what I want in much the same way I did a year ago. While that’s obviously a harder task, it seems to me like the task this whole long term exponential argument hinges on. I obviously could be wrong and maybe LLM with eval loop will do all of this for us but it seems still quite bad at anything without a clear definition of “good”. I’m personally much more concerned about autonomous weapons, surveillance, and people plugging these things into places they don’t belong to avoid responsibility than I am the general possibility of these models being smarter than me in every way but obviously I could be wrong on this and am just using it incorrectly, hence the question. | |
| ▲ | baq 6 hours ago | parent | prev [-] | | > Now extrapolate how things will look if the trend continues for another 2 or 3 years. …and humans are famously bad at extrapolating exponentially, which is kinda the point of the essay. |
|
|
| ▲ | AgentME 7 hours ago | parent | prev | next [-] |
| If we already were at the point that AI could self-direct effectively, then the world would already be very different (eg AI-driven technological progress and unemployment) in a way that we might have wished we prepared for more. |
| |
| ▲ | david_shaw 6 hours ago | parent [-] | | > we might have wished we prepared for more Do you mean policy-wise (like Dario is talking about), or more broadly? I wonder about broad preparedness, but unfortunately there's not a lot that we "normal" people can do to prepare. Hoard savings and food? Learn physical trades? |
|
|
| ▲ | ofjcihen 7 hours ago | parent | prev | next [-] |
| N+1. This is my experience and for the most part the people that I work with share the same feeling. A highly enthusiastic concussion enthusiast with 10 hands is how one person put it. These are people in different fields but highly accomplished so I’m feeling comfortable sharing their assessment. |
| |
| ▲ | TobyTheCamel 6 hours ago | parent [-] | | You're talking quite statically though. I don't think anyone is worried about today's models being a serious threat, but next year's, three years' time? Just three years' ago these models were useful bumbling fools and it's hard to judge where on the S-curve we currently are. I'd rather be thinking about these issues in advance rather than waiting until the problem becomes real. | | |
| ▲ | dontlikeyoueith 3 hours ago | parent | next [-] | | The models are still useful bumbling fools. We're in the flat part of the curve because we've exhausted existing data sources. | |
| ▲ | ofjcihen 6 hours ago | parent | prev [-] | | > I don't think anyone is worried about today's models being a serious threat Fable is essentially bricked for my areas of interest (even being a member of the cybersecurity program). It seems like they’re attempting to sell regulatory capture under the guise of safety. That’s more of the point. |
|
|
|
| ▲ | aspenmartin 7 hours ago | parent | prev | next [-] |
| It's nice that people are genuinely curious about this. - All of your observations are absolutely dead on - Yet, we have very very very robust scaling laws that as Dario points out we've had and validated for over a decade. This extends to downstream measures like METR time horizon and compsosite benchmarks like the epoch capability index. - If you look at where you're at now, which is again dead on, you're looking at a point on a curve that is quite easy to extrapolate, but less easy to tell when exactly on the curve a certain capability or use case undergoes a step change from error rates dropping below a threshold that is hard to anticipate in advance. So while Dario / other frontier CEOs are understandably unpalatable, they are absolutely spot on with a call out that all of this is bound to happen and happen quickly, and that's without solving several core problems that haven't been solved yet (e.g. continual learning). In 2023, coding agents were just laughable. Yet they followed the same predictable training curves. Anyone looking at the data can see the obvious, and anyone reading newspaper headlines or hacker news comments would get a very different impression. |
| |
| ▲ | oudlys 6 hours ago | parent | next [-] | | Are we plotting against cost? How is the capability advancement vs dollars paid for development? By my read of the (very sparse) data, we're getting linear improvements in capability for super-linear increases in costs. [1] Indicates that by 2027 models will cost $1 billon to train. Dario estimates that model runs will cost $10 billion in 2026 [2]. That to me indicates costs are potentially growing faster than capability. Maybe by quite a bit. If the value prop of LLMs doesn't prove out, that won't last. I'm of the opinion there is no data that shows actual economic value being delivered by models. The best data shows that LLM use might be destroying value [3]. [1] https://epoch.ai/publications/how-much-does-it-cost-to-train...
[2] https://lexfridman.com/dario-amodei-transcript/
[3] https://unessays.substack.com/p/talk-is-cheap | | |
| ▲ | aspenmartin 5 hours ago | parent | next [-] | | I appreciate the data here but I don't think the read is quite right; Saying we have linear capability for super-linear cost compares an unbounded variable (dollars) to bounded instruments (because benchmarks saturate). On unbounded measures, growth is exponential; you can see METR time horizons double every ~4-7 months (https://metr.org/blog/2026-1-29-time-horizon-1-1/). And capability being proportional to log(compute) is what the scaling law predicts. Epoch puts training cost growth at ~2.4x/year as your link shows. Meanwhile cost for fixed capability falls ~10-40x/year (https://epoch.ai/data-insights/llm-inference-price-trends), and lab revenue is growing ~10x/year! Anthropic went from $1B to $9B to $30B+ run rate in ~15 months, OpenAI ~$25B. On [3]: the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev. The RCT evidence is genuinely mixed (METR: -19%, with n = 20 and Claude 3.x; Cui et al: +26%) but its just super hard to do this well, I think Faros stuff was pretty cool, I haven't seen this before so thank you for the reference. | | |
| ▲ | oudlys 5 hours ago | parent | next [-] | | >"On unbounded measures, growth is exponential" Maybe. There was a great comment in the thread on Fable 5 yesterday about benchmark comparisons between Fable and the latest opus models. here it is: https://news.ycombinator.com/item?id=48464600. You could be right, but this is the most direct benchmark comparison I could find and it's not that strong. >the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev. I discuss this directly in my analysis. There's also an 860% code churn increase ratio. You only need 9% of that to be allocated to wasteful rework to drive throughput flat to the 15% rework baseline. Not to an assumed ideal state where there was no rework. But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO. I appreciate you reading my stuff and taking the data seriously. Thank you. | | |
| ▲ | andrekandre 3 hours ago | parent [-] | | > But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.
n=1 but at $JOB we have throughput quotas now, and what is happening is that teams are just finding lots of busywork (renaming things, gardening of ai .md files, rewriting uis etc) and also dividing prs into smaller chunks to match the quotas... so even "throughout increase" doesn't say much if its not for improving the customer outcome (ime anyways) |
| |
| ▲ | balefulboy 5 hours ago | parent | prev [-] | | METR's time horizon is not a reliable metric of LLM capability growth: https://www.transformernews.ai/p/against-the-metr-graph-codi... |
| |
| ▲ | simianwords 6 hours ago | parent | prev [-] | | >By my read of the (very sparse) data, we're getting linear improvements in capability for super-linear increases in costs. [1] Indicates that by 2027 models will cost $1 billon to train. Dario estimates that model runs will cost $10 billion in 2026 [2]. That to me indicates costs are potentially growing faster than capability. Maybe by quite a bit. This is true and well established. As long as you get any improvement whatsoever, it is worth spending to train since it pays off during. Imagine training was not $1 billion but $100 billion but the performance improved by just 10%. This is still worth it because you can squeeze out the profits across years and years right? The improvement is ever lasting. > The best data shows that LLM use might be destroying value [3]. This is basically a conspiracy theory and if you really believed this, you should not have led with "How is the capability advancement vs dollars paid for development?" because if there were no value, it doesn't really matter how much you invest. | | |
| ▲ | oudlys 5 hours ago | parent [-] | | >This is basically a conspiracy theory I think this is pretty uncharitable, especially when I've provided you with a dataset you can evaluate yourself and an argument you can review for logical inconsistency. I have worked quite hard to locate data that supports your thesis, I can't find it. I've at least gone to the effort of documenting that search. Before you throw around such strong convictions, I suggest you actually look for yourself. | | |
| ▲ | simianwords 5 hours ago | parent [-] | | Respectfully, your link is not very convincing. But what’s interesting is that you are commenting on a post where Dario is suggesting that LLMs are so extremely powerful that they can take over, help synthesise bioweapons, help in warfare, help in drug discovery — the whole post here is to try and regulate this. If you believe AI can’t even create positive value let alone discover new things then your problem is somewhere else and not in something like “but training costs a lot”. So it is absolutely strange and contrasting to see you believe that LLMs are so weak as to create negative value while the CEO is asking about regulations because AI is too powerful. I don’t think I can convince you that AI is actually that powerful. But let me ask you something directly: if you believe what you believe, you should also acknowledge that AI doesn’t need regulations in the context Dario is proposing since obviously AI can’t do anything he predicts. Do you agree? | | |
|
|
| |
| ▲ | snaking0776 5 hours ago | parent | prev [-] | | That’s interesting. I commented something about this elsewhere but to me part of the exponential argument that loses me though is that it can often seem like a way to distract from issues that already exist which we should be working to fix. Things like autonomous weapons or mass surveillance are already here and rather terrifying and I would hope that we would dedicate our time to fixing those rather than having industry leaders focus so much on hypotheticals. While I guess the hypothetical scenario could be so bad that we must focus on it, I imagine a world which can’t come up with a way to spread wealth more equally or prevent mass proliferation of surveillance technology through profit seeking behavior will not be able to handle a digital super intelligence. So I keep coming back to the question: why is all I hear these industry leaders talking about is the threat of extinction? Maybe it’s just news coverage but I would love to see a leading lab release research on the health effects of subaudible sound in datacenters or other immediately present issues which would build good will towards these further out concerns. | | |
| ▲ | hollerith 2 hours ago | parent [-] | | >why is all I hear these industry leaders talking about is the threat of
extinction? . . . I would love to see a leading lab release research on the
health effects of subaudible sound in datacenters It is straightforward for industry leaders to avoid living near data centers,
but there's no way for them to insulate themselves from the extinction threat -- no way short of somehow eliminating the danger for everybody, which seems quite hard to do. Since industry leaders are as self-centered as everyone else, the extinction threat is what they think about. Also, you describe the extinction threat as "further out". A lot of us think there is already some small amount of AI extinction risk being incurred every day. I.e., we think the period of danger has already begun. | | |
| ▲ | snaking0776 44 minutes ago | parent [-] | | I see. I wonder how this works out in terms of risk/reward. I suppose if you take extinction as -infinite cost than it would be the only issue worth thinking about. Where I think this line of thinking gets challenging is when you need to take in terms of a counter factual. A lot of these were already risks prior to AI (bioweapons, nukes, etc) so what’s the marginal increase in probability as a result of AI I guess is the question which matters. I could get more around this way of framing it than saying that AI itself is the problem. It’s just the being more capable as a species increases risks. I think a lot of these pushback comes from the fact that it’s often the CEO who stands to gain huge by saying his tool is going to end the world so we need public buyin to supporting it. If instead it was just framed as “general technological advancement” is dangerous but potentially worthwhile I think more would be on board. |
|
|
|
|
| ▲ | margalabargala 6 hours ago | parent | prev | next [-] |
| I've experienced the same. That said Claude Code has a million features like loops that I know exist but never use. I imagine that spending a lot more time creating an initial spec goes a long way towards independence, I just don't usually do that. |
|
| ▲ | davnicwil 6 hours ago | parent | prev | next [-] |
| > this exponential line of thinking It's a clever argument because if you question it, you're reminded of the entire history of technological development which is, guess what, exponential. You're sometimes also dismissed as not understanding the concept of exponentials. This again is clever, as it's baked into the definition that if you don't see it happening, or can't imagine it happening, well that's precisely a tell you're living through an exponential! All the reasons you might give can be countered with, essentially, "that problem that seems clear today will go away sooner than you can imagine and when it does you'll be on the back foot, so you'd better just assume it will go away and project/plan accordingly". The trick is entirely that one cannot possibly deny the general power of exponential progress across all of technology, it's almost a law, but it doesn't work in the other direction - no particular local technology is owed exponential growth because of this general pattern. Sometimes things just cap out at merely 'useful' and don't improve much further, no matter how much you want to believe they won't, no matter how steep the progress curve (or, indeed, line) has been up to that point. To this point the narrative of what these tools can do over these last 3 or 4 years has always been way ahead of the reality. Everyone who works with the tools knows this. Not everyone wants it to be true, so some will not acknowledge it and will just keep pushing this year-ahead projection as ground truth today. Many (not all) of those people aren't builders, so they don't have to deal with present reality jarring up against this projection of what ought to be possible, they're safe just talking about what should hypothetically be possible and making plans around that that won't be executed for months to years anyway. This keeps the flywheel going, and in fairness, some of the reality has actually caught up in certain ways, so some of those plans will have to some degree worked out which spins the flywheel faster still. In the end though I just keep thinking: it's been 4 years (as referenced in the post). A lot has happened, the tools are very cool and very useful for certain things. But when I put my head up and look around in the world, even just the software world, nothing's really changed in terms of actual outcomes, in terms of new things appearing or being built that didn't exist 4 years ago. Certainly nothing feels instinctively like it's improved much, subjectively. Maybe this is what it feels like to be in the knee of a curve of an exponential, but it seems equally reasonable this is just a breakthrough that's kind of improving at a clip you'd expect it to for all the investment put in, but fundamentally is just a new tool that needs to be slowly commercialised in an economically rational way, as we gear up for the next breakthrough which may or may not be related. Who says it must just keep improving forever? This argument never made much sense to me. |
|
| ▲ | newcommentsorry 7 hours ago | parent | prev [-] |
| This is a very tech-focused message board, populated by mostly tech-insiders, so perhaps a little outside perspective will help people understand. Tech people are following a religious belief system whose utopian promise is the all-powerful computer that will end all suffering. I once read an article in reason magazine from over 30 years ago about how an advanced computer in the future will bring everyone who has ever lived back from the deat and let them live in paradise. They were completely serious. Atheists reading this may object to my description of the tech belief system as religious, but I believe it is accuarte. The idea that tech is an imrpovement and will improve people's lives is believed as an act of faith. Tech has its own moral systems based on some form of libertarian progressivism. And in the future, through the inevitable scientific magic of exponential something, a computer will ascend to godhood and judge all mankind for their actions before allowing some into eternal paradise. To what extent any of this is true is up for debate, but most west coast tech elite are actively working towards this future, and these are the ideas that drive them. It's hard to talk to them about it because this is their woldview, and they imagine everyone to believe what they do. |
| |
| ▲ | jgil 3 hours ago | parent | next [-] | | Heard a tongue-in-cheek comment about "building a god" from someone at one of these AI labs. The builders believe that the machine you describe will judge them positively, purely because they are building the system according to their judgment and beliefs. | |
| ▲ | aspenmartin 7 hours ago | parent | prev [-] | | > Tech people are following a religious belief system whose utopian promise is the all-powerful computer that will end all suffering. Uh, I don't really think that's anywhere close to an accurate characterization of most people here. Everyone, including Dario and any researcher at any frontier lab, knows the situation is quite scary and unprecedented. There are problems that will be solved and diseases that will be cured, but will we be living in an Orwellian universe? Will a rogue drone swarm find you cowing in your basement and murder you? I mean the technology for this is already mostly here, it's a matter of the willpower and budget to roll out something really evil. The comment's question is about capabilities and why the discussion about capabilities often times is far removed from todays capabilities. | | |
| ▲ | nozzlegear 6 hours ago | parent [-] | | I dunno, I don't think an outside observer would be too hard-pressed to find the fervor with which some talk about the endless possibilities and miraculous works of AI to border on the cusp of religious. | | |
| ▲ | aspenmartin 6 hours ago | parent [-] | | Some people sure, but some people believe the earth is flat. What’s without a doubt is the impact AI has already had, something maybe 2 years ago was dismissed as religious fervor maybe. Statistics and robust measurements plus stable scaling laws make this pretty far from religious I would say. |
|
|
|