| ▲ | einrealist 12 hours ago |
| > It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later. Somewhere, there are GPUs/NPUs running hot. You send all the necessary data, including information that you would never otherwise share. And you most likely do not pay the actual costs. It might become cheaper or it might not, because reasoning is a sticking plaster on the accuracy problem. You and your business become dependent on this major gatekeeper. It may seem like a good trade-off today. However, the personal, professional, political and societal issues will become increasingly difficult to overlook. |
|
| ▲ | cyode 9 hours ago | parent | next [-] |
| This quote stuck out to me as well, for a slightly different reason. The “tenacity” referenced here has been, in my opinion, the key ingredient in the secret sauce of a successful career in tech, at least in these past 20 years. Every industry job has its intricacies, but for every engineer who earned their pay with novel work on a new protocol, framework, or paradigm, there were 10 or more providing value by putting the myriad pieces together, muddling through the ever-waxing complexity, and crucially never saying die. We all saw others weeded out along the way for lacking the tenacity. Think the boot camp dropouts or undergrads who changed majors when first grappling with recursion (or emacs). The sole trait of stubbornness to “keep going” outweighs analytical ability, leetcode prowess, soft skills like corporate political tact, and everything else. I can’t tell what this means for the job market. Tenacity may not be enough on its own. But it’s the most valuable quality in an employee in my mind, and Claude has it. |
| |
| ▲ | noosphr 7 hours ago | parent | next [-] | | There is an old saying back home: an idiot never tires, only sweats. Claude isn't tenacious. It is an idiot that never stops digging because it lacks the meta cognition to ask 'hey, is there a better way to do this?'. Chain of thought's whole raison d'etre was so the model could get out of the local minima it pushed itself in. The issue is that after a year it still falls into slightly deeper local minima. This is fine when a human is in the loop. It isn't what you want when you have a thousand idiots each doing a depth first search on what the limit of your credit card is. | | |
| ▲ | Havoc 7 hours ago | parent | next [-] | | > it lacks the meta cognition to ask 'hey, is there a better way to do this?'. Recently had an AI tell me this code (that it wrote) is a mess and suggested wiping it and starting from scratch with a more structure plan. That seems to hint at some meta cognition outlines | | |
| ▲ | zzrrt 6 hours ago | parent | next [-] | | Haha, it has the human developer traits of thinking all old code is garbage, failing to identify oneself as the dummy who wrote this particular code, and wanting to start from scratch. | | |
| ▲ | dpkirchner 6 hours ago | parent [-] | | It's like NIH syndrome but instead "not invented here today". Also a very human thing. |
| |
| ▲ | rurp 3 hours ago | parent | prev | next [-] | | Perhaps. I've had LLMs tell me some code is deeply flawed garbage that should be rewritten about code that exact same LLM wrote minutes before. It could be a sign of deep meta cognition, or it might be due to some cognitive gaps where it has no idea why it did something a minute ago and suddenly has a different idea. | |
| ▲ | lbrito 5 hours ago | parent | prev | next [-] | | Someone will say "you just need to instruct Claude.md to be more meta and do a wiggum loop on it" | |
| ▲ | teaearlgraycold 4 hours ago | parent | prev | next [-] | | I asked Claude to analyze something and report back. It thought for a while said “Wow this analysis is great!” and then went back to thinking before delivering the report. They’re auto-sycophantic now! | |
| ▲ | hyperadvanced 6 hours ago | parent | prev | next [-] | | Metacognition As A Service, you say? | | | |
| ▲ | karlgkk an hour ago | parent | prev [-] | | lol no it doesn’t. It hints at convincing language models |
| |
| ▲ | samusiam 6 hours ago | parent | prev | next [-] | | I mean, not always. I've seen Claude step back and reconsider things after hitting a dead end, and go down a different path. There are also workflows, loops that can increase the likelihood of this occurring. | |
| ▲ | cocacolacowboy 3 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | BeetleB 8 hours ago | parent | prev | next [-] | | This is a major concern for junior programmers. For many senior ones, after 20 (or even 10) years of tenacious work, they realize that such work will always be there, and they long ago stopped growing on that front (i.e. they had already peaked). For those folks, LLMs are a life saver. At a company I worked for, lots of senior engineers become managers because they no longer want to obsess over whether their algorithm has an off by one error. I think fewer will go the management route. (There was always the senior tech lead path, but there are far more roles for management than tech lead). | | |
| ▲ | codyb 4 hours ago | parent | next [-] | | I feel like if you're really spending a ton of time on off by one errors after twenty years in the field you haven't actually grown much and have probably just spent a ton of time in a single space. Otherwise you'd be senior staff to principle range and doing architecture, mentorship, coordinating cross team work, interviewing, evaluating technical decisions, etc. I got to code this week a bit and it's been a tremendous joy! I see many peers at similar and lower levels (and higher) who have more years and less technical experience and still write lots of code and I suspect that is more what you're talking about. In that case, it's not so much that you've peaked, it's that there's not much to learn and you're doing a bunch of the same shit over and over and that's of course tiring. I think it also means that everything you interact with outside your space does feel much harder because of the infrequency with which you have interacted with it. If you've spent your whole career working the whole stack from interfaces to infrastructure then there's really not going to be much that hits you as unfamiliar after a point. Most frameworks recycle the same concepts and abstractions, same thing with programming languages, algorithms, data management etc. But if you've spent most of your career in one space cranking tickets, those unknown corners are going to be as numerous as the day you started and be much more taxing. | |
| ▲ | rishabhaiover 8 hours ago | parent | prev | next [-] | | That's just sad. Right when I found love in what I do, my work has no value anymore. | | |
| ▲ | jasonfarnon 7 hours ago | parent [-] | | Aren't you still better off than the rest of us who found what they love + invested decades in it before it lost its value. Isn't it better to lose your love when you still have time to find a new one? | | |
| ▲ | josephg 5 hours ago | parent | next [-] | | I don't think so. Those of us who found what we love and invested decades into it got to spend decades getting paid well to do what we love. | |
| ▲ | pesus 7 hours ago | parent | prev | next [-] | | Depends on if their new love provides as much money as their old one, which is probably not likely. I'd rather have had those decades to stash and invest. | | |
| ▲ | jasonfarnon 6 hours ago | parent [-] | | A lot of pre-faang engineers dont have the stash you're thinking about. What you meant was "right when I found a lucrative job that I love". What was going on in tech these last 15 years, unfortunately, probably was once in a lifetime. | | |
| ▲ | WarmWash 5 hours ago | parent [-] | | It's crazy to think back in the 80's programmers had "mild" salaries despite programming back then being worlds more punishing. No libraries, no stack exchange, no forums, no endless memory and infinite compute. If you had a challenging bug you better also be proficient in reading schematics and probing circuits. |
|
| |
| ▲ | nfredericks 6 hours ago | parent | prev [-] | | This is genuinely such a good take | | |
| ▲ | dugidugout 3 hours ago | parent [-] | | Especially on the topic of value! We are all intuitively aware that value is highly contextual, but get in a knot trying to rationalize value long past genuine engagement! |
|
|
| |
| ▲ | test6554 8 hours ago | parent | prev [-] | | Imagine a senior dev who just approves PRs, approves production releases, and prioritizes bug reports and feature requests. LLM watches for errors ceaslessly, reports an issue. Senior dev reviews the issue and assigns a severity to it. Another LLM has a backlog of features and errors to go solve, it makes a fix and submits a PR after running tests and verifying things work on its end. |
| |
| ▲ | techgnosis 7 hours ago | parent | prev | next [-] | | Why are we pretending like the need for tenacity will go away? Certain problems are easier now. We can tackle larger problems now that also require tenacity. | | |
| ▲ | samusiam 6 hours ago | parent [-] | | Even right at this very moment where we have a high-tenacity AI, I'd argue that working with the AI -- that is to say, doing AI coding itself and dealing with the novel challenges that brings requires a lot of stubborn persistence. |
| |
| ▲ | mykowebhn an hour ago | parent | prev [-] | | Fittingly, George Hinton toiled away for years in relative obscurity before finally being recognized for his work. I was always quite impressed by his "tenacity". So although I don't think he should have won the Nobel Prize because not really physics, I felt his perseverance and hard work should merit something. |
|
|
| ▲ | daxfohl 11 hours ago | parent | prev | next [-] |
| I still find in these instances there's at least a 50% chance it has taken a shortcut somewhere: created a new, bigger bug in something that just happened not to have a unit test covering it, or broke an "implicit" requirement that was so obvious to any reasonable human that nobody thought to document it. These can be subtle because you're not looking for them, because no human would ever think to do such a thing. Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!" |
| |
| ▲ | einrealist 5 minutes ago | parent | next [-] | | And there is this paradox where it becomes harder to detect the problems as the models 'improve'. | |
| ▲ | gtowey 10 hours ago | parent | prev | next [-] | | The value extortion plan writes itself. How long before someone pitches the idea that the models explicitly almost keep solving your problem to get you to keep spending? Would you even know? | | |
| ▲ | password4321 7 hours ago | parent | next [-] | | First time I've seen this idea, I have a tingling feeling it might become reality sooner rather than later. | |
| ▲ | sailfast 9 hours ago | parent | prev | next [-] | | That’s far-fetched. It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise. High value to user + lower compute costs = better pricing power and better margins overall. | | |
| ▲ | d0mine 8 hours ago | parent | next [-] | | > far-fetched Remember Google? Once it was far-fetched that they would make the search worse just to show you more ads. Now, it is a reality. With tokens, it is even more direct. The more tokens users spend, the more money for providers. | | |
| ▲ | retsibsi 4 hours ago | parent | next [-] | | > Now, it is a reality. What are the details of this? I'm not playing dumb, and of course I've noticed the decline, but I thought it was a combination of losing the battle with SEO shite and leaning further and further into a 'give the user what you think they want, rather than what they actually asked for' philosophy. | | | |
| ▲ | throwthrowuknow 7 hours ago | parent | prev [-] | | Only if you are paying per token on the API. If you are paying a fixed monthly fee then they lose money when you need to burn more tokens and they lose customers when you can’t solve your problems within that month and max out your session limits and end up with idle time which you use to check if the other providers have caught up or surpassed your current favourite. | | |
| ▲ | layla5alive 2 hours ago | parent [-] | | Indeed, unlimited plan seems like the only way that makes sense to not have it be guaranteed to be abused by the provider |
|
| |
| ▲ | xienze 8 hours ago | parent | prev [-] | | > It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise. Unless you’re paying by the token. |
| |
| ▲ | Fnoord 7 hours ago | parent | prev | next [-] | | I was thinking more of deliberate backdoor in code. RCE is an obvious example, but another one could be bias. "I'm sorry ma'am, computer says you are ineligable for a bank account." These ideas aren't new. They were there in 90s already when we still thought about privacy and accountability regarding technology, and dystopian novels already described them long, long ago. | |
| ▲ | fragmede 10 hours ago | parent | prev | next [-] | | The free market proposition is that competition (especially with Chinese labs and grok) means that Anthropic is welcome to do that. They're even welcome to illegally collude with OpenAi such that ChatGPT is similarly gimped. But switching costs are pretty low. If it turns out I can one shot an issue with Qwen or Deepseek or Kimi thinking, Anthropic loses not just my monthly subscription, but everyone else's I show that too. So no, I think that's some grade A conspiracy theory nonsense you've got there. | | |
| ▲ | coffeefirst 9 hours ago | parent | next [-] | | It’s not that crazy. It could even happen by accident in pursuit of another unrelated goal. And if it did, a decent chunk of the tech industry would call it “revealed preference” because usage went up. | | |
| ▲ | hnuser123456 9 hours ago | parent [-] | | LLMs became sycophantic and effusive because those responses were rated higher during RLHF, until it became newsworthy how obviously eager-to-please they got, so yes, being highly factually correct and "intelligent" was already not the only priority. |
| |
| ▲ | bandrami 6 hours ago | parent | prev | next [-] | | > But switching costs are pretty low Switching costs are currently low. Once you're committed to the workflow the providers will switch to prepaying for a year's worth of tokens. | |
| ▲ | daxfohl 8 hours ago | parent | prev | next [-] | | To be clear I don't think that's what they're doing intentionally. Especially on a subscription basis, they'd rather me maximize my value per token, or just not use them. Lulling users into using tokens unproductively is the worst possible option. The way agents work right now though just sometimes feels that way; they don't have a good way of saying "You're probably going to have to figure this one out yourself". | |
| ▲ | 7 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | jrflowers 9 hours ago | parent | prev | next [-] | | This is a good point. For example if you have access to a bunch of slot machines, one of them is guaranteed to hit the jackpot. Since switching from one slot machine to another is easy, it is trivial to go from machine to machine until you hit the big bucks. That is why casinos have such large selections of them (for our benefit). | | |
| ▲ | krupan 8 hours ago | parent | next [-] | | "for our benefit" lol! This is the best description of how we are all interacting with LLMs now. It's not working? Fire up more "agents" ala gas town or whatever | |
| ▲ | robotmaxtron 4 hours ago | parent | prev [-] | | last time I was at a casino I checked to see what company built the machines, imagine my surprise that it was (by my observation) a single vendor. |
| |
| ▲ | thunderfork 9 hours ago | parent | prev [-] | | As a rational consumer, how would you distinguish between some intentional "keep pulling the slot machine" failure rate and the intrinsic failure rate? I feel like saying "the market will fix the incentives" handwaves away the lack of information on internals. After all, look at the market response to Google making their search less reliable - sure, an invested nerd might try Kagi, but Google's still the market leader by a long shot. In a market for lemons, good luck finding a lime. | | |
| |
| ▲ | chanux 3 hours ago | parent | prev [-] | | Is this from a page of dating apps playbook? |
| |
| ▲ | wvenable 9 hours ago | parent | prev | next [-] | | > These can be subtle because you're not looking for them After any agent run, I'm always looking the git comparison between the new version and the previous one. This helps catch things that you might otherwise not notice. | | |
| ▲ | teaearlgraycold 4 hours ago | parent [-] | | And after manually coding I often have an LLM review the diff. 90% of the problems it finds can be discounted, but it’s still a net positive. |
| |
| ▲ | charcircuit 10 hours ago | parent | prev [-] | | You are using it wrong, or are using a weak model if your failure rate is over 50%. My experience is nothing like this. It very consistently works for me. Maybe there is a <5% chance it takes the wrong approach, but you can quickly steer it in the right direction. | | |
| ▲ | testaccount28 10 hours ago | parent [-] | | you are using it on easy questions. some of us are not. | | |
| ▲ | meowface an hour ago | parent | next [-] | | A lot of people are getting good results using it on hard things. Obviously not perfect, but > 50% success. That said, more and more people seem to be arriving at the conclusion that if you want a fairly large-sized, complex task in a large existing codebase done right, you'll have better odds with Codex GPT-5.2-Codex-XHigh than with Claude Code Opus 4.5. It's far slower than Opus 4.5 but more likely to get things correct, and complete, in its first turn. | |
| ▲ | mikkupikku 9 hours ago | parent | prev | next [-] | | I think a lot of it comes down to how well the user understands the problem, because that determines the quality of instructions and feedback given to the LLM. For instance, I know some people have had success with getting claude to do game development. I have never bothered to learn much of anything about game development, but have been trying to get claude to do the work for me. Unsuccessful. It works for people who understand the problem domain, but not for those who don't. That's my theory. | | |
| ▲ | samrus 8 hours ago | parent [-] | | It works for hard problems when the person already solves it and just needs the grunt work done It also works for problems that have been solved a thousand times before, which impresses people and makes them think it is actually solving those problems | | |
| ▲ | daxfohl 8 hours ago | parent | next [-] | | Which matches what they are. They're first and foremost pattern recognition engines extraordinaire. If they can identify some pattern that's out of whack in your code compared to something in the training data, or a bug that is similar to others that have been fixed in their training set, they can usually thwack those patterns over to your latent space and clean up the residuals. If comparing pattern matching alone, they are superhuman, significantly. "Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. Their ability to pattern match makes reasoning seem more powerful than it actually is. If your bug is within some reasonable distance of a pattern it has seen in training, reasoning can get it over the final hump. But if your problem is too far removed from what it has seen in its latent space, it's not likely to figure it out by reasoning alone. | | |
| ▲ | charcircuit 7 hours ago | parent [-] | | >"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train. |
| |
| ▲ | thunky 5 hours ago | parent | prev [-] | | > It also works for problems that have been solved a thousand times before So you mean it works on almost all problems? |
|
| |
| ▲ | baq 10 hours ago | parent | prev [-] | | Don’t use it for hard questions like this then; you wouldn’t use a hammer to cut a plank, you’d try to make a saw instead |
|
|
|
|
| ▲ | fooker 11 hours ago | parent | prev | next [-] |
| > It might become cheaper or it might not If it does not, this is going to be first technology in the history of mankind that has not become cheaper. (But anyway, it already costs half compared to last year) |
| |
| ▲ | ctoth 10 hours ago | parent | next [-] | | > But anyway, it already costs half compared to last year You could not have bought Claude Opus 4.5 at any price one year ago I'm quite certain. The things that were available cost half of what they did then, and there are new things available. These are both true. I'm agreeing with you, to be clear. There are two pieces I expect to continue: inference for existing models will continue to get cheaper. Models will continue to get better. Three things, actually. The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0]. [0]: https://blog.irvingwb.com/blog/2018/09/a-critical-appraisal-... | | |
| ▲ | simianwords 10 hours ago | parent | next [-] | | interesting post. i wonder if these people go back and introspect on how incorrect they have been? do they feel the need to address it? | | |
| ▲ | fooker 10 hours ago | parent | next [-] | | No, people do not do that. This is harmless when it comes to tech opinions but causes real damage in politics and activism. People get really attached to ideals and ideas, and keep sticking to those after they fail to work again and again. | | |
| ▲ | simianwords 10 hours ago | parent [-] | | i don't think it is harmless or we are incentivising people to just say whatever they want without any care for truth. people's reputations should be attached to their predictions. |
| |
| ▲ | cogogo 9 hours ago | parent | prev | next [-] | | Some people definitely do but how do they go and address it? A fresh example in that it addresses pure misinformation. I just screwed up and told some neighbors garbage collection was delayed for a day because of almost 2ft of snow. Turns out it was just food waste and I was distracted checking the app and read the notification poorly. I went back to tell them (do not know them at all just everyone is chattier digging out of a storm) and they were not there. Feel terrible and no real viable remedy. Hope they check themselves and realize I am an idiot. Even harder on the internet. | |
| ▲ | maest 5 hours ago | parent | prev [-] | | Do _you_ do that? | | |
| |
| ▲ | teaearlgraycold 3 hours ago | parent | prev | next [-] | | As a user of LLMs since GPT-3 there was noticeable stagnation in LLM utility after the release of GPT-4. But it seems the RLHF, tool calling, and UI have all come together in the last 12 months. I used to wonder what fools could be finding them so useful to claim a 10x multiplier - even as a user myself. These days I’m feeling more and more efficiency gains with Claude Code. | | |
| ▲ | HNisCIS 41 minutes ago | parent [-] | | That's the thing people are missing, the models plateaued a while ago, still making minor gains to this day, but not huge ones. The difference is now we've had time to figure out the tooling. I think there's still a ton of ground to cover there and maybe the models will improve given that the extra time, but I think it's foolish to consider people who predicted that completely wrong. There are also a lot of mathematical concerns that will cause problems in the near and distant future. Infinite progress is far from a given, we're already way behind where all the boosters thought we'd be my now. |
| |
| ▲ | bsder 8 hours ago | parent | prev [-] | | > The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0]. Everybody who bet against Moore's Law was wrong ... until they weren't. And AI is the reaction to Moore's Law having broken. Nobody gave one iota of damn about trying to make programming easier until the chips couldn't double in speed anymore. | | |
| ▲ | twoodfin 8 hours ago | parent [-] | | This is exactly backwards: Dennard scaling stopped. Moore’s Law has continued and it’s what made training and running inference on these models practical at interactive timescales. | | |
| ▲ | bsder 7 hours ago | parent [-] | | You are technically correct. The best kind of correct. However, most people don't know the difference between the proper Moore's Law scaling (the cost of a transistor halves every 2 years) which is still continuing (sort of) and the colloquial version (the speed of a transistor doubles every 2 years) which got broken when Dennard scaling ran out. To them, Moore's Law just broke. Nevertheless, you are reinforcing my point. Nobody gave a damn about improving the "programming" side of things until the hardware side stopped speeding up. And rather than try to apply some human brainpower to fix the "programming" side, they threw a hideous number of those free (except for the electricity--but we don't mention that--LOL) transistors at the wall to create a broken, buggy, unpredictable machine simulacrum of a "programmer". (Side note: And to be fair, it looks like even the strong form of Moore's Law is finally slowing down, too) | | |
| ▲ | twoodfin 6 hours ago | parent [-] | | If you can turn a few dollars of electricity per hour into a junior-level programmer who never gets bored, tired, or needs breaks, that fundamentally changes the economics of information technology. And in fact, the agentic looped LLMs are executing much better than that today. They could stop advancing right now and still be revolutionary. |
|
|
|
| |
| ▲ | peaseagee 10 hours ago | parent | prev | next [-] | | That's not true. Many technologies get more expensive over time, as labor gets more expensive or as certain skills fall by the wayside, not everything is mass market. Have you tried getting a grandfather clock repaired lately? | | |
| ▲ | willio58 10 hours ago | parent | next [-] | | Repairing grandfather clocks isn't more expensive now because it's gotten any harder; it's because the popularity of grandfather clocks is basically nonexistent compared to anything else to tell time. | |
| ▲ | simianwords 10 hours ago | parent | prev | next [-] | | "repairing a unique clock" getting costlier doesn't mean technology hasn't gotten cheaper. check out whether clocks have gotten cheaper in general. the answer is that it has. there is no economy of scale here in repairing a single clock. its not relevant to bring it up here. | | |
| ▲ | ipaddr 9 hours ago | parent [-] | | Clocks prices have gone up since 2020. Unless a cheaper better way to make clocks has emerged inflation causes prices to grow. | | |
| ▲ | fooker 9 hours ago | parent | next [-] | | Luxury watches have gone up, 'clocks' as a technology is cheaper than ever. You can buy one for 90 cents on temu. | | |
| ▲ | ipaddr 8 hours ago | parent [-] | | The landing cost for that 90 cent watch has gone way up. Shipping and to some degree taxes has pushed the price higher. | | |
| ▲ | pas 7 hours ago | parent [-] | | that's not the technology of course it's silly to talk about manufacturing methods and yield and cost efficiency without having an economy to embed all of this into, but ... technology got cheaper means that we have practical knowledge of how to make cheap clocks (given certain supply chains, given certain volume, and so and so) we can make very cheap very accurate clocks that can be embedded into whatever devices, but it requires the availability of fabs capable of doing MEMS components, supply materials, etc. |
|
| |
| ▲ | simianwords 9 hours ago | parent | prev [-] | | not true, clocks have gone down after accounting for inflation. verified using ChatGPT. | | |
| ▲ | ipaddr 8 hours ago | parent [-] | | You can't account for inflation because the price increase is inflation. | | |
| ▲ | pas 7 hours ago | parent | next [-] | | you can look at a basket of goods that doesn't have your specific product and compare directly but inflation is the general price level increase, this can be used as a deflator to get the price of whatever product in past/future money amount to see how the price of the product changed in "real" terms (ie. relative to the general price level change) | |
| ▲ | simianwords 8 hours ago | parent | prev [-] | | this is not true |
|
|
|
| |
| ▲ | esafak 10 hours ago | parent | prev | next [-] | | Instead of advancing tenuous examples you could suggest a realistic mechanism by which costs could rise, such as a Chinese advance on Taiwan, effecting TSMC, etc. | |
| ▲ | emtel 9 hours ago | parent | prev | next [-] | | Time-keeping is vastly cheaper. People don't want grandfather clocks. They want to tell time. And they can, more accurately, more easily, and much cheaper than their ancestors. | |
| ▲ | groby_b 10 hours ago | parent | prev | next [-] | | No. You don't get to make "technology gets more expensive over time" statements for deprecated technologies. Getting a bespoke flintstone axe is also pretty expensive, and has also absolutely no relevance to modern life. These discussions must, if they are to be useful, center in a population experience, not in unique personal moments. | | |
| ▲ | ipaddr 9 hours ago | parent | next [-] | | I purchased a 5T drive in 2019 and the price is higher now despite newer better drives going on the market since. Not much has down in price over the last few years. | | | |
| ▲ | solomonb 9 hours ago | parent | prev | next [-] | | okay how about the Francis Scott Key Bridge? https://marylandmatters.org/2025/11/17/key-bridge-replacemen... | | |
| ▲ | groby_b 7 hours ago | parent [-] | | You will get a different bridge. With very different technology. Same as "I can't repair my grandfather clock cheaply". In general, there are several things that are true for bridges that aren't true for most technology: * Technology has massively improved, but most people are not realizing that. (E.g. the Bay Bridge cost significantly more than the previous version, but that's because we'd like to not fall down again in the next earthquake)
* We still have little idea how to reason about the cost of bridges in general. (Seriously. It's an active research topic)
* It's a tiny market, with the major vendors forming an oligopoly
* It's infrastructure, not a standard good
* The buy side is almost exclusively governments. All of these mean expensive goods that are completely non-repeatable. You can't build the same bridge again. And on top of that, in a distorted market. But sure, the cost of "one bridge, please" has gone up over time. | | |
| ▲ | solomonb 6 hours ago | parent | next [-] | | This seems largely the same as any other technology. The prices of new technologies go down initially as we scale up and optimize it's production, but as soon as demand fades, due to newer technology or whatever, the cost of that technology goes up again. | |
| ▲ | fooker 7 hours ago | parent | prev [-] | | > But sure, the cost of "one bridge, please" has gone up over time. Even if you adjust for inflation? |
|
| |
| ▲ | arthurbrown 9 hours ago | parent | prev [-] | | Bought any RAM lately? Phone? GPU in the last decade? | | |
| ▲ | ipaddr 9 hours ago | parent [-] | | The latest iphone has gone down in price? It's double. I guess the marketing is working. | | |
| ▲ | xnyan 7 hours ago | parent [-] | | "Pens are not cheaper, look at this Montblanc" is not a good faith response. '84 Motorola DynaTAC - ~$12k AfI (adjusted for inflation) '89 MicroTAC ~$8k AfI '96 StarTAC ~$2k AfI `07 iPhone ~$673 AfI The current average smartphone sells for around $280. Phones are getting cheaper. |
|
|
| |
| ▲ | epidemiology 6 hours ago | parent | prev [-] | | Or riding in an uber? |
| |
| ▲ | fulafel 2 hours ago | parent | prev | next [-] | | I don't think computation is going to become more expensive, but there are techs that have become so: Nuclear power plants. Mobile phones. Oil extraction. (Oil rampdown is a survival imperative due to the climate catastrophe so there it's a very positive thing of course, though not sufficient...) | |
| ▲ | InsideOutSanta 10 hours ago | parent | prev | next [-] | | Sure, running an LLM is cheaper, but the way we use LLMs now requires way more tokens than last year. | | |
| ▲ | fooker 10 hours ago | parent | next [-] | | 10x more tokens today cost less than than half of X tokens from ~mid 2024. | |
| ▲ | simianwords 10 hours ago | parent | prev [-] | | ok but the capabilities are also rising. what point are you trying to make? | | |
| ▲ | oytis 10 hours ago | parent [-] | | That it's not getting cheaper? | | |
| ▲ | jstummbillig 10 hours ago | parent | next [-] | | But it is, capability adjusted, which is the only way it makes sense. You can definitely produce last years capability at a huge discount. | |
| ▲ | simianwords 10 hours ago | parent | prev [-] | | you are wrong. https://epoch.ai/data-insights/llm-inference-price-trends this is accounting for the fact that more tokens are used. | | |
| ▲ | techpression 10 hours ago | parent [-] | | The chart shows that they’re right though. Newer models cost more than older models.
Sure they’re better but that’s moot if older models are not available or can’t solve the problem they’re tasked with. | | |
| ▲ | simianwords 9 hours ago | parent | next [-] | | this is incorrect. the cost to achieve the same task by old models is way higher than by new models. > Newer models cost more than older models where did you see this? | | |
| ▲ | techpression 9 hours ago | parent [-] | | On the link you shared, 4o vs 3.5 turbo price per 1m tokens. There’s no such thing as ”same task by old model”, you might get comparable results or you might not (and this is why the comparison fail, it’s not a comparison), the reason you pick the newer models is to increase chances of getting a good result. | | |
| ▲ | simianwords 9 hours ago | parent [-] | | > The dataset for this insight combines data on large language model (LLM) API prices and benchmark scores from Artificial Analysis and Epoch AI. We used this dataset to identify the lowest-priced LLMs that match or exceed a given score on a benchmark. We then fit a log-linear regression model to the prices of these LLMs over time, to measure the rate of decrease in price. We applied the same method to several benchmarks (e.g. MMLU, HumanEval) and performance thresholds (e.g. GPT-3.5 level, GPT-4o level) to determine the variation across performance metrics This should answer. In your case, GPT-3.5 definitely is cheaper per token than 4o but much much less capable. So they used a model that is cheaper than GPT-3.5 that achieved better performance for the analysis. |
|
| |
| ▲ | fooker 9 hours ago | parent | prev [-] | | OpenAI has always priced newer models lower than older ones. | | |
|
|
|
|
| |
| ▲ | root_axis 8 hours ago | parent | prev | next [-] | | Not true. Bitcoin has continued to rise in cost since its introduction (as in the aggregate cost incurred to run the network). LLMs will face their own challenges with respect to reducing costs, since self-attention grows quadratically. These are still early days, so there remains a lot of low hanging fruit in terms of optimizations, but all of that becomes negligible in the face of quadratic attention. | | | |
| ▲ | krupan 8 hours ago | parent | prev | next [-] | | There are plenty of technologies that have not become cheaper, or at least not cheap enough, to go big and change the world. You probably haven't heard of them because obviously they didn't succeed. | |
| ▲ | asadotzler 8 hours ago | parent | prev | next [-] | | cheaper doesnt mean cheap enough to be viable after the bills come due | |
| ▲ | ak_111 9 hours ago | parent | prev | next [-] | | Concorde? | |
| ▲ | runarberg 2 hours ago | parent | prev [-] | | Supersonic jet engines, rockets to the moon, nuclear power plants, etc. etc. all have become more expensive. Superconductors were discovered in 1911, and we have been making them for as long as we have been making transistors in the 1950s, yet superconductors show no sign of becoming cheaper any time soon. There have been plenty of technologies in history which do not in fact become cheaper. LLMs are very likely to become such, as I suspect their usefulness will be superseded by cheaper (much cheaper in fact) specialized models. |
|
|
| ▲ | bob1029 3 hours ago | parent | prev | next [-] |
| Humans run hot too. Once you factor in the supply chain that keeps us alive, things become surprisingly equivalent. Eating burgers and driving cars around costs a lot more than whatever # of watts the human brain consumes. |
| |
| ▲ | bbor an hour ago | parent [-] | | I mean, “equivalent” is an understatement! There’s a reason Claude Code costs less than hiring a full time software engineer… |
|
|
| ▲ | chasebank 2 hours ago | parent | prev | next [-] |
| I don’t understand this pov. Unfortunately, id pay 10k mo for my cc sub. I wish I could invest in anthropic, they’re going to be the most profitable company on earth |
|
| ▲ | redox99 8 hours ago | parent | prev | next [-] |
| > And you most likely do not pay the actual costs. This is one of the weakest anti AI postures. "It's a bubble and when free VC money stops you'll be left with nothing". Like it's some kind of mystery how expensive these models are to run. You have open weight models right now like Kimi K2.5 and GLM 4.7. These are very strong models, only months behind the top labs. And they are not very expensive to run at scale. You can do the math. In fact there are third parties serving these models for profit. The money pit is training these models (and not that much if you are efficient like chinese models). Once they are trained, they are served with large profit margins compared to the inference cost. OpenAI and Anthropic are without a doubt selling their API for a lot more than the cost of running the model. |
|
| ▲ | crazygringo 8 hours ago | parent | prev | next [-] |
| > Somewhere, there are GPUs/NPUs running hot. Running at their designed temperature. > You send all the necessary data, including information that you would never otherwise share. I've never sent the type of data that isn't already either stored by GitHub or a cloud provider, so no difference there. > And you most likely do not pay the actual costs. So? Even if costs double once investor subsidies stop, that doesn't change much of anything. And the entire history of computing is that things tend to get cheaper. > You and your business become dependent on this major gatekeeper. Not really. Switching between Claude and Gemini or whatever new competition shows up is pretty easy. I'm no more dependent on it than I am on any of another hundred business services or providers that similarly mostly also have competitors. |
|
| ▲ | karlgkk an hour ago | parent | prev | next [-] |
| > And you most likely do not pay the actual costs Oh my lord you absolutely do not. The costs to oai per token inference ALONE are at least 7x. AT LEAST and from what I’ve heard, much higher. |
| |
| ▲ | tgrowazay an hour ago | parent [-] | | We can observe how much generic inference providers like deepinfra or together-ai charge for large SOTA models. Since they are not subsidized and they don’t charge 7x of OpenAI, that means OAI also doesn’t have outrageously high per-token costs. |
|
|
| ▲ | mikeocool 9 hours ago | parent | prev | next [-] |
| To me this tenacity is often like watching someone trying to get a screw into board using a hammer. There’s often a better faster way to do it, and while it might get to the short term goal eventually, it’s often created some long term problems along the way. |
|
| ▲ | hahahahhaah 9 hours ago | parent | prev | next [-] |
| It is also amazing seeing Linux kernel work, scheduling threads, proving interrupts and API calls all without breaking a sweat or injuring its ACL. |
|
| ▲ | YetAnotherNick 10 hours ago | parent | prev | next [-] |
| With optimizations and new hardware, power is almost a negligible cost. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing. It's just Nvidia/TSMC/other manufacturers eating up the profit now because they can. My bet is that China will match current Nvidia within 5 years. [1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0... |
| |
| ▲ | storystarling 9 hours ago | parent [-] | | Electricity is negligible but the dominant cost is the hardware depreciation itself. Also inference is typically memory bandwidth bound so you are limited by how fast you can move weights rather than raw compute efficiency. | | |
| ▲ | YetAnotherNick 3 hours ago | parent [-] | | Yes, because the margin is like 80% for Nvidia, and 80% again for the manufacturers like Samsung and TSMC. Once the fixed cost like R and D is amortized the same node technology and hardware capacity could be just few single digit percent of current. |
|
|
|
| ▲ | utopiah an hour ago | parent | prev [-] |
| AI genius discover brute forcing... what a time to be alive. /s Like... bro that's THE foundation of CS. That's the principle of The bomb in Turing's time. One can still marvel at it but it's been with us since the beginning. |