| ▲ | The Future of AI Software Development(martinfowler.com) |
| 121 points by nthypes 2 hours ago | 73 comments |
| |
|
| ▲ | chadash 2 hours ago | parent | next [-] |
| > Will LLMs be cheaper than humans once the subsidies for tokens go away? At this point we have little visibility to what the true cost of tokens is now, let alone what it will be in a few years time. It could be so cheap that we don’t care how many tokens we send to LLMs, or it could be high enough that we have to be very careful. We do have some idea. Kimi K2 is a relatively high performing open source model. People have it running at 24 tokens/second on a pair of Mac Studios, which costs 20k. This setup requires less than a KW of power, so the $0.8-0.15 being spent there is negligible compared to a developer. This might be the cheapest setup to run locally, but it's almost certain that the cost per token is far cheaper with specialized hardware at scale. In other words, a near-frontier model is running at a cost that a (somewhat wealthy) hobbyist can afford. And it's hard to imagine that the hardware costs don't come down quite a bit. I don't doubt that tokens are heavily subsidized but I think this might be overblown [1]. [1] training models is still extraordinarily expensive and that is certainly being subsidized, but you can amortize that cost over a lot of inference, especially once we reach a plateau for ideas and stop running training runs as frequently. |
| |
| ▲ | embedding-shape an hour ago | parent | next [-] | | > a near-frontier model Is Kimi K2 near-frontier though? At least when run in an agent harness, and for general coding questions, it seems pretty far from it. I know what the benchmarks say, they always say it's great and close to frontier models, but is this other's impression in practice? Maybe my prompting style works best with GPT-type models, but I'm just not seeing that for the type of engineering work I do, which is fairly typical stuff. | | |
| ▲ | crystal_revenge an hour ago | parent | next [-] | | I’ve been running K2.5 (through the API) as my daily driver for coding through Kimi Code CLI and it’s been pretty much flawless. It’s also notably cheaper and I like the option that if my vibe coded side projects became more than side projects I could run everything in house. I’ve been pretty active in the open model space and 2 years ago you would have had to pay 20k to run models that were nowhere near as powerful. It wouldn’t surprise me if in two more years we continue to see more powerful open models on even cheaper hardware. | | |
| ▲ | vuldin 44 minutes ago | parent | next [-] | | I agree with this statement. Kimi K2.5 is at least as good as the best closed source models today for my purposes. I've switched from Claude Code w/ Opus 4.5 to OpenCode w/ Kimi K2.5 provided by Fireworks AI. I never run into time-based limits, whereas before I was running into daily/hourly/weekly/monthly limits all the time. And I'm paying a fraction of what Anthropic was charging (from well over $100 per month to less than $50 per month). | |
| ▲ | embedding-shape 38 minutes ago | parent | prev [-] | | > it’s been pretty much flawless So above and beyond frontier models? Because they certainly aren't "flawless" yet, or we have very different understanding of that word. |
| |
| ▲ | fullstackchris an hour ago | parent | prev [-] | | regardless its been 3 years since the release of chatgpt. literally 3. imagine in just 5 more years how much low hanging (or even big breakthroughs) will get into the pricing, things like quantization, etc. no doubt in my mind the question of "price per token" will head towards 0 |
| |
| ▲ | lambda an hour ago | parent | prev | next [-] | | You don't even need to go this expensive. An AMD Ryzen Strix Halo (AI Max+ 395) machine with 128 GiB of unified RAM will set you back about $2500 these days. I can get about 20 tokens/s on Qwen3 Coder Next at an 8 bit quant, or 17 tokens per second on Minimax M2.5 at a 3 bit quant. Now, these models are a bit weaker, but they're in the realm of Claude Sonnet to Claude Opus 4. 6-12 months behind SOTA on something that's well within a personal hobby budget. | | |
| ▲ | cowmix an hour ago | parent [-] | | If you don't mind saying, what distro and/or Docker container are you using to bet Qwen3 Coder Next going? |
| |
| ▲ | consp an hour ago | parent | prev | next [-] | | 20k for such a setup for a hobbyist? You can leave the somewhat away and go into sub 1% region globally. A kw of power is still 2k/year at least for me, not that I expect it will run continuously but still not negligible if you can do with 100-200 a year on cheap subscriptions. | | |
| ▲ | dec0dedab0de 27 minutes ago | parent | next [-] | | There are plenty of normal people with hobbies that cost much more. Off the top of my head, recreational vehicles like racecars and motorcycles, but im sure there are others. You might be correct when you say the global 1%, but that's still 83 million people. | | |
| ▲ | markb139 19 minutes ago | parent [-] | | I used to think photography was an expensive hobby until my wife got back into the horse world. |
| |
| ▲ | simonw an hour ago | parent | prev [-] | | "a (somewhat wealthy) hobbyist" |
| |
| ▲ | msp26 8 minutes ago | parent | prev | next [-] | | Horrific comparison point. LLM inference is way more expensive locally for single users than running batch inference at scale in a datacenter on actual GPUs/TPUs. | | |
| ▲ | AlexandrB 3 minutes ago | parent [-] | | How is that horrific? It sets an upper bound on the cost, which turns out to be not very high. |
| |
| ▲ | manwe150 an hour ago | parent | prev | next [-] | | Reminder to others that $20k is the one time startup cost, and is amortized perhaps 2-4k/year (plus power). That is in the realm of a mere gym membership around me for a family | |
| ▲ | newsoftheday an hour ago | parent | prev | next [-] | | > a cost that a (somewhat wealthy) hobbyist can afford $20,000 is a lot to drop on a hobby. We're probably talking less than 10%, maybe less than 5% of all hobbyists could afford that. | | |
| ▲ | charcircuit an hour ago | parent [-] | | You can rent computer from someone else to majorly reduce the spend. If you just pay for tokens it will be cheaper than buying the entire computer outright. |
| |
| ▲ | qaq an hour ago | parent | prev | next [-] | | If I remember correctly Dario had claimed that AI inference gross profit margins are 40%-50% | |
| ▲ | PlatoIsADisease an hour ago | parent | prev [-] | | >24 tokens/second this is marketing not reality. Get a few lines of code and it becomes unusable. |
|
|
| ▲ | PaulHoule 37 minutes ago | parent | prev | next [-] |
| Get over your FOMO: I walked into that room expecting to learn from people who were
further ahead. People who’d cracked the code on how to adopt AI at scale,
how to restructure teams around it, how to make it work. Some of the
sharpest minds in the software industry were sitting around those tables.
And nobody has it all figured out.
People who say they have are trying to mess with your head. |
|
| ▲ | simonw 2 hours ago | parent | prev | next [-] |
| > LLMs are eating specialty skills. There will be less use of specialist front-end and back-end developers as the LLM-driving skills become more important than the details of platform usage. Will this lead to a greater recognition of the role of Expert Generalists? Or will the ability of LLMs to write lots of code mean they code around the silos rather than eliminating them? This is one of the most interesting questions right now I think. I've been taking on much more significant challenges in areas like frontend development and ops and automation and even UI design now that LLMs mean I can be much more of a generalist. Assuming this works out for more people, what does this mean for the shape of our profession? |
| |
| ▲ | neebz an hour ago | parent | next [-] | | I've faced the same but my conclusion is the opposite. In the past 6 months, all my code has been written by claude code and gemini cli. I have written code backend, frontend, infrastructure and iOS. Considering my career trajectory all of this was impossible a couple of years ago. But the technical debt has been enormous. And I'll be honest, my understanding of these technologies hasn't been 'expert' level. I'm 100% sure any experienced dev could go through my code and may think it's a load of crap requiring serious re-architecture. It works (that's great!) but the 'software engineering' side of things is still subpar. | | |
| ▲ | crystal_revenge an hour ago | parent | next [-] | | A lot of people aren’t realizing that it’s not about replacing software engineers, it’s about replacing software. We’ve been trying to build well engineered, robust, scalable systems because software had to be written to serve other users. But LLMs change that. I have a bunch of vibe coded command lines tools that exactly solve my problems, but very likely would make terrible software. The thing is, this program only needs to run on my machine the way I like to use it. In a growing class of cases bespoke tools are superior to generalized software. This historically was not the case because it took too much time and energy to maintain these things. But today if my vibe coded solution breaks, I can rebuild it almost instantly (because I understand the architecture). It takes less time today to build a bespoke tool that solved your problem than it does to learn how to use existing software. There’s still plenty of software that cannot be replaced with bespoke tools, but that list is shrinking. | | |
| ▲ | munk-a 35 minutes ago | parent [-] | | I absolutely believe in that value proposition - but I've heard a lot about how beneficial it will be for large organizationally backed software products. If it isn't valuable to that later scenario (which I have uncertainty about) then there is no way companies like OpenAI could ever justify their valuations. | | |
| ▲ | crystal_revenge 9 minutes ago | parent [-] | | > there is no way companies like OpenAI could ever justify their valuations The value proposition isn't really "we'll help you write all the code for your company" it's a world where the average user's computer is a dumb terminal that opens up to a ChatGPT interface. I didn't initially understand the value prop but have increasingly come to see it. The gamble is that that LLMs will be your interface to everything the same way HTTP was for the last 20 years. The mid-90s had a similar mix of deep skepticism and hype-driven madness (and if you read my comments you'll see I've historically been much closer to the skeptic side, despite a lot of experience in this space). But even in the 90s the hyped-up bubble riders didn't really see the idea that http would be how everything happens. We've literally hacked a document format and document serving protocol to build the entire global application infrastructure. We saw a similar transformation with mobile devices where most of your world lives on a phone and the phone maker gets a nice piece of that revenue. People thought Zuck was insane for his metaverse obsession, but what he was chasing was that next platform. He was wrong of course, but what his hope was was that VR would be the way people did everything. Now this is what the LLM providers are really after. Claude/ChatGPT/Grok will be your world. You won't have to buy SaaS subscriptions for most things because you can just build it yourself. Why use Hubspot when you can just have AI do all your marketting, then you just need Hubspot for their message sending infrastructure. Why pay for a budgeting app when you can just build a custom one that lives on OpenAIs server (today your computer, but tomorrow theirs). Companies like banks will maintain interfaces to LLMs but you won't be doing your banking in their web app. Even social media will ultimately be replaced by an endless stream of bespoke images video and content made just for you (and of course it will be much easier to inject advertising into this space you don't even recognize as advertising). The value prop is that these large, well funded, AI companies will just eat large chunks of industry. |
|
| |
| ▲ | mikkupikku an hour ago | parent | prev [-] | | Similar experience for me. I've been using it to make Qt GUIs, something I always avoided in the past because it seemed like a whole lot of stuff to learn when I could just make a TUI or use Tkinter if I really needed a GUI for some reason. Claude Code is producing working useful GUIs for me using Qt via pyside6. They work well but I have no doubt that a dev with real experience with Qt would shudder. Nonetheless, because it does work, I am content to accept that this code isn't meant to be maintained by people so I don't really care if it's ugly. |
| |
| ▲ | petcat an hour ago | parent | prev | next [-] | | Code is, I think, rapidly becoming a commodity. It used to be that the code itself was what was valuable (Microsoft MS-DOS vs. the IBM PC hardware). And it has stayed that way for a long time. FOSS meant that the cost of building on reusable components was nearly zero. Large public clouds meant the cost of running code was negligible. And now the model providers (Anthropic, Google, OpenAI) means that the cost of producing the code is relatively small. When the marginal cost of producing code approaches zero, we start optimizing for all the things around it. Code is now like steel. It's somewhat valuable by itself, but we don't need the town blacksmith to make us things anymore. What is still valuable is the intuition to know what to build, and when to build it. That's the je ne sais quoi still left in our profession. | | |
| ▲ | simonw 11 minutes ago | parent | next [-] | | > What is still valuable is the intuition to know what to build, and when to build it. That's the je ne sais quoi still left in our profession. Absolutely. Also crucial is what's possible to build. That takes a great deal of knowledge and experience, and is something that changes all the time. | |
| ▲ | rawgabbit an hour ago | parent | prev | next [-] | | From https://annievella.com/posts/finding-comfort-in-the-uncertai... “Ideas that surfaced: code as ‘just another projection’ of intended behaviour. Tests as an alternative projection. Domain models as the thing that endures. One group posed the provocative question: what would have to be true for us to ‘check English into the repository’ instead of code? The implications are significant. If code is disposable and regenerable, then what we review, what we version-control, and what we protect all need rethinking.” | |
| ▲ | Rover222 an hour ago | parent | prev | next [-] | | yes, agreed that coding (implementation), which was once extremely expensive for businesses, is trending towards a negligible price. Planning, coordination, strategy at a high level are as challenging as ever. I'm getting more done than ever, but NOT working less hours in a day (as an employee at a product company). | |
| ▲ | HPsquared an hour ago | parent | prev | next [-] | | Like column inches in a newspaper. But some news is important and that's the editor's job to decide. | |
| ▲ | softwaredoug an hour ago | parent | prev [-] | | I’d say the jury might be out on whether code is worthless for giant pieces of infrastructure (Linux kernel). There, small problems create outsized issues for everybody, so the incentive is to be conservative and focused on quality. Second there’s a world of difference still between a developer with taste using AI with care and the slop cannons out there churning out garbage for others to suffer through. I’m betting there is value in the former in the long run. |
| |
| ▲ | SignalStackDev 17 minutes ago | parent | prev | next [-] | | Both forces are playing out simultaneously - which is what makes this hard to forecast. The generalist capability boost is real. I'm shipping things that would have required frontend, backend, and devops specialists two years ago. But a new specialization is quietly emerging alongside that: people who understand how LLM pipelines behave in production. This is genuinely hard knowledge that doesn't transfer from traditional engineering. Multi-step agent pipelines fail in ways that look nothing like normal software bugs - context contamination between model calls, confidence-correlated hallucinations that vary by model family, retry logic that creates feedback loops in agentic chains. Debugging this requires understanding the statistical behavior of models as much as the code. My guess: the profession splits more than it unifies. Most developers will use LLMs to be faster generalists on standard work. A smaller group will specialize in building the infrastructure those LLMs run on - model routing, context management, failure isolation, eval pipelines. That second group isn't really a generalist or a traditional specialist. It's something new. The Fowler article's 'supervisory middle loop' concept hints at this - someone has to monitor what the agents are doing, and that role requires both breadth and a very specific kind of depth. | |
| ▲ | AutumnsGarden an hour ago | parent | prev [-] | | I’ve become the same way. Instead of specializing in the unique implementations, I’ve leaned more into planning everything out even more completely and writing skills backed by industry standards and other developer’s best practices (also including LOTS of anti-patterns). My work flow has improved dramatically since then, but I do worry that I am not developing the skills to properly _debug_ these implementations, as the skills did most of the work. | | |
| ▲ | mjr00 an hour ago | parent [-] | | IMO debugging is a separate skill from development anyway. I've known plenty of developers in my career who were fully capable of writing and shipping code, especially the kind of boilerplate widgets/RPCs that LLMs excel at generating, yet if a bug happened their approach was largely just changing somewhat random stuff to see if it worked rather than anything methodical. If you want to get/stay good at debugging--again IMO--it's more important to be involved in operations, where shit goes wrong in the real world because you're dealing with real invalid data that causes problems like poison pill messages stuck in a message queue, real hardware failures causing services to crash, real network problems like latency and timeouts that cause services which work in the happy path to crumble under pressure. Not only does this instil a more methodical mentality in you, it also makes you a better developer because you think about more classes of potential problems and how to handle them. |
|
|
|
| ▲ | tabs_or_spaces 21 minutes ago | parent | prev | next [-] |
| > Will this lead to a greater recognition of the role of Expert Generalists?
I've always felt that LLMs can make you average in a new area/topic/domain really quickly. But you still need expertise to make the most out of the LLM. Personally, I'm more interested in whether software development has become more or less pay to win with LLMs? |
|
| ▲ | mehagar 19 minutes ago | parent | prev | next [-] |
| It's refreshing to hear people say "We're not really sure" in public, especially from experts. I agree that AI tools are likely to amplify the importance of quick cycles and continuous delivery. |
|
| ▲ | senko 2 hours ago | parent | prev | next [-] |
| What's with the editorialized title? The text is actually about the Thoughtworks Future of Software Development retreat. |
| |
| ▲ | nthypes an hour ago | parent [-] | | IMHO, it doesn't, but I have changed the title to avoid any confusion. |
|
|
| ▲ | riffraff 2 hours ago | parent | prev | next [-] |
| I think the title on HN doesn't reflect all that is in TFA, but rather the linked article[0]. Fowler's article is interesting tho. I do like the idea that "all code is tech debt", and we shouldn't want to produce more of it than we need. But it's also worth remembering that debt is not bad per se, buying a house with a mortgage is also debt and can be a good choice for many reasons. [0]: https://thenewstack.io/ai-velocity-debt-accelerator/ |
| |
| ▲ | simonw 2 hours ago | parent | next [-] | | Yeah that editorialized title is entirely wrong for this post. Problem is the real title is "Fragments: February 18" which is no good here either. I suggest something like "Tidbits from the Thoughtworks Future of Software Development Retreat" (from the first sentence, captures the content reasonably well.) | |
| ▲ | eru an hour ago | parent | prev | next [-] | | Tech debt is totally misnamed. 'Tech debt' behaves more like equity than debt: if you project goes nowhere, the 'tech debt' becomes a non-issues. | |
| ▲ | senko 2 hours ago | parent | prev | next [-] | | I like the "cognitive debt" idea outlined here: https://margaretstorey.com/blog/2026/02/09/cognitive-debt/ (from a participant of the retreat) and especially the pithy "velocity without understanding is not sustainable" phrase. | |
| ▲ | nthypes an hour ago | parent | prev [-] | | IMHO, it doesn't, but I have changed the title to avoid any confusion. |
|
|
| ▲ | greymalik an hour ago | parent | prev | next [-] |
| The headline misrepresents the source. It’s not the title of the page, not the point of the content, and biases the quote’s context: “ if traditional software delivery best practices aren’t already in place, this velocity multiplier becomes a debt accelerator” |
| |
| ▲ | nthypes an hour ago | parent [-] | | IMHO, it doesn't, but I have changed the title to avoid any confusion. |
|
|
| ▲ | acomjean an hour ago | parent | prev | next [-] |
| So do we need new abstractions / languages? It seems clear that a lot of things can be pulled together by AI because it’s tedious for humans. But it seems to indicate that better tooling is needed. |
|
| ▲ | anthonypasq an hour ago | parent | prev | next [-] |
| What is up with all this nonsense about token subsidies? Dario in his recent interview with Dwarkesh made it abundantly clear that they have substantial inference margins, and they use that to justify the financing for the next training run. Chinese open source models are dirt cheap, you can buy $20 worth of kimi-k2.5 on opencode and spam it all week and barely make a dent. Assuming we never got bigger models, but hardware keeps improving, we'll either be serviing current models for pennies, or at insane speeds, or both. The only actual situation where tokens are being subsidized is free tiers on chat apps, which are largely irrelevant for any sort of useful economic activity. |
| |
| ▲ | simonw an hour ago | parent | next [-] | | There exist a large number of people who are absolutely convinced that LLM providers are all running inference at a loss in order to capture the market and will drive the prices up sky high as soon as everyone is hooked. I think this is often a mental excuse for continuing to avoid engaging with this tech, in the hope that it will all go away. | | |
| ▲ | kingstnap an hour ago | parent | next [-] | | I agree with you, but also the APIs are proper expensive to be fair. What people probably get messed up on as being the loss leader is likely generous usage limits on flat rate subscriptions. For example GitHub Copilot Pro+ comes with 1500 premium requests a month. That's quite a lot and it's only $39.00. (Requests ~ Prompts). For some time they were offering Opus 4.6 Fast at 9x billing (now raised to 30x). That was upto 167 requests of around ~128k context for just $39. That ridiculous model costs $30/$150 Mtok so you can easily imagine the economics on this. | |
| ▲ | louiereederson an hour ago | parent | prev [-] | | Referring to my earlier comment, you need to have a model for how to account for training costs. If Anthropic stops training models now, what happens to their revenues and margins in 12 months? There's a difference between running inference and running a frontier model company. | | |
| ▲ | simonw an hour ago | parent [-] | | Training costs are fixed. You spend $X-bn training a model and that single model then benefits all of your customers. Inference costs grow with your users. Provided you are making a profit on that inference you can eventually cover your training costs if you sign up enough paying customers. If you LOSE money on inference every new customer makes your financial position worse. |
|
| |
| ▲ | louiereederson an hour ago | parent | prev [-] | | Anthropic reduced their gross margin forecast per external reporting (below) to 40%, and have exceeded internal forecasts on inference costs. This does not take into account amortized training costs which are substantial (well over 50% of revenue) and accounted for as occurring below gross profit. If you view training as a cost of staying in the game, then it is justifiable to view it as at least a partially variable cost that should be accounted for in gross margin, particularly given that the models stay on leading edge for only a few months. If that's the case then gross margins are probably minimal, maybe or negative. https://www.theinformation.com/articles/anthropic-lowers-pro... |
|
|
| ▲ | empath75 25 minutes ago | parent | prev | next [-] |
| So here are a few things i have been thinking of:
---
It's not 2 pizza teams, it's 2 people teams. You no longer need 4 people on a team just working on features off of a queue, you just need 2 people making technical decisions and managing agents.
---
Code used to be expensive to create. It was only economical to write code if it was doing high value work or work that would be repeated many times over a long period of time. Now producing code is _cheap_. You can write and run code in an automated way _on demand_. But if you do that, you have essentially traded upfront cost for run time cost. It's really only worth it if the work is A) high value and B) intermittent. There is probably a formula you can write to figure out where this trade off makes sense and when it doesn't. I'm working on a system where we can just chuck out autonomous agents onto our platform with a plain text description, and one thing I have been thinking about is tracking those token costs and figuring out how to turn agentic workflows into just normal code. I've been thinking about running an agent that watches the other agents for cost and reads their logs ono a schedule to see if any of what the agents are doing can be codified and turned into a normal workflow, and possibly even _writing that workflow itself_. It would be analogous to the JVM optimizing hot-path functions...
--- What I do know is that what we are doing for a living will be near unrecognizable in a year or two. |
|
| ▲ | adregan 2 hours ago | parent | prev | next [-] |
| In the section on security: > One large enterprise employee commented that they were deliberately slow with AI tech, keeping about a quarter behind the leading edge. “We’re not in the business of avoiding all risks, but we do need to manage them”. I’m unclear how this pattern helps with security vis-à-vis LLMs. It makes sense when talking about software versions, in hoping that any critical bugs are patched, but prompt injection springs eternal. |
| |
| ▲ | MattGrommes 15 minutes ago | parent | next [-] | | I took this to mean more like not jumping right on OpenClaw, but wait a quarter or so to give it at least a little time to shake out. There are so many new tools coming out I think it's beneficial not to be the guinea pig. | |
| ▲ | Quothling an hour ago | parent | prev | next [-] | | I work in a NIS2 regulated sector and I'm not sure we can ever let any AI agent run in anything we do. We have a centralized sollution where people can build their own chatbots with various configurations and cross models. That's in the isolation of the browser though, and while I'm sure employees are putting things into it they shouldn't, at least it's inside our setup and not in whatever chatbot they haven't yet run out of tokens on. Security wise though, I'm not sure how you can meet any form of compliance if you grant AI's access unless you have four eye validation on every single action it takes... which is just never going to happen. We've experimented with rolling open source models on local hardware, but it's so easy to inject things into them that it's not really going anywhere. It's going to be a massive challenge, because if we don't provide the tools, employees are going to figure out how to do it on their own. | |
| ▲ | bilekas 2 hours ago | parent | prev [-] | | > but prompt injection springs eternal. Yes, but some are mitigated when discoverd, and some more critical areas need to be isolated from the LLM so taking their time to provision LLM into their lifecycle is important, and they're happy to spend the time doing it right, rather than just throwing the latest edge tech into their system. | | |
| ▲ | ethin 2 hours ago | parent [-] | | How exactly can you "mitigate" prompt injections? Given that the language space is for all intents and purposes infinite, and given that you can even circumvent these by putting your injections in hex or base64 or whatever? Like I just don't see how one can truly mitigate these when there are infinite ways of writing something in natural language, and that's before we consider the non-natural languages one can use too. | | |
| ▲ | lambda an hour ago | parent | next [-] | | The only ways that I can think of to deal with prompt injection, are to severely limit what an agent can access. * Never give an agent any input that is not trusted * Never give an agent access to anything that would cause a security problem (read only access to any sensitive data/credentials, or write access to anything dangerous to write to) * Never give an agent access to the internet (which is full of untrusted input, as well as places that sensitive data could be exfiltrated) An LLM is effectively an unfixable confused deputy, so the only way to deal with it is effectively to lock it down so it can't read untrusted input and then do anything dangerous. But it is really hard to do any of the things that folks find agents useful for, without relaxing those restrictions. For instance, most people let agents install packages or look at docs online, but any of those could be places for prompt injection. Many people allow it to run git and push and interact with their Git host, which allow for dangerous operations. My current experimentation is running my coding agent in a container that only has access to the one source directory I'm working on, as well as the public internet. Still not great as the public internet access means that there's a huge surface area for prompt injection, though for the most part it's not doing anything other than installing packages from known registries where a malicious package would be just as harmful as a prompt injection. Anyhow, there have been various people talking about how we need more sandboxes for agents, I'm sure there will be products around that, though it's a really hard problem to balance usability with security here. | |
| ▲ | charcircuit 30 minutes ago | parent | prev | next [-] | | If the model is properly aligned then it shouldn't matter if there is an infinite ways for an attacker to ask the model to break alignment. | |
| ▲ | bilekas an hour ago | parent | prev [-] | | Full mitigation seems impossible to me at least but the obvious and public sandox escape prompts that have been discovered and "patched" out just making it more difficult I guess. But afau it's not possible to fully mitigate. |
|
|
|
|
| ▲ | taeric an hour ago | parent | prev | next [-] |
| I really hate that we allowed "debt" to become a synonym for "liability." This isn't a case where you have specific code/capital you have borrowed and need to pay for its use or give it back. This is flat out putting liabilities into your assets that will have to be discovered and dealt, someday. |
|
| ▲ | mamma_mia an hour ago | parent | prev | next [-] |
| mamma mia! out with the old in with the new, soon github will be like a warehouse full of old punchcards |
|
| ▲ | fuzzfactor 2 hours ago | parent | prev | next [-] |
| Looks to me like the people that are filthy rich [0] can afford to move so fast that even the people who are very rich in the regular way can't keep up. [0] Which is not even enough, these are the ones with truly excess money to burn. |
| |
| ▲ | bilekas 2 hours ago | parent [-] | | I'm not sure you read the article, it's not referring to financials, but tech debt. | | |
| ▲ | fuzzfactor an hour ago | parent [-] | | I like Fowler and reviewed it well. Are you assuming tech debt has no financial cost? | | |
| ▲ | bilekas an hour ago | parent [-] | | Oh sure but it just usually doesn't show up on a financial statement so just seemed a bit strange to be commenting on the financials is all, maybe I misunderstood your context. |
|
|
|
|
| ▲ | siliconc0w an hour ago | parent | prev | next [-] |
| Even with the latest SOTA models - I still consistently find issues. Performance, security, memory leaks, bad assumptions/instruction following, and even levels of laziness/gaslighting/dishonesty. I spend less time authoring changes but a lot more time reviewing and validating changes. And that is using the best models (Opus 4.6/Codex 5.3), the OSS/flash models are still quite unreliable at solving problems. Token costs are also non-trivial. Claude can exhaust a $20/month session limit with one difficult problem (didn't even write code, just planned). Each engineer needs at least the $200/mo plan - I have multiple plans from multiple providers. |
|
| ▲ | deadbabe an hour ago | parent | prev | next [-] |
| There have been some back of the napkin estimates on what AI could cost from the major platforms once no longer subsidized. It does not look good, as there is a minimum of a 12x increase in costs. Local or self hosted LLMs will ultimately be the future. Start learning how to build up your own AI stack and use it day to day. Hopefully hardware catches up so eventually running LLMs on device is the norm. |
|
| ▲ | christkv an hour ago | parent | prev | next [-] |
| My bet is that the amount of work needed per token generated will decrease over time and the models will become smaller for the same performance as we learn to optimize so cost and needed hardware will go down |
|
| ▲ | clockworkhavoc an hour ago | parent | prev [-] |
| Martin Fowler, longtime associate of the sanctioned fugitive and CCP-backed funder of domestic terrorism, Neville Roy Singham? https://oversight.house.gov/wp-content/uploads/2025/09/Lette... |
| |
| ▲ | g8oz 27 minutes ago | parent [-] | | 1. Singham is not a fugitive from American justice just yet - although refusing to cooperate with Congress may lead him to be. 2. Is it a problem if a rich guy funds activities in America that suspiciously align with a foreign power? That has interesting implications for many pro Israel billionaires and organizations. 3. Only a paranoid MAGA troll would characterize the left wing groups he funds as domestic terrorists. Code Pink? Pro Palestinian protest groups? Come on. |
|