| ▲ | Ask HN: Do you have any evidence that agentic coding works? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 259 points by terabytest 21 hours ago | 240 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been trying to get agentic coding to work, but the dissonance between what I'm seeing online and what I'm able to achieve is doing my head in. Is there real evidence, beyond hype, that agentic coding produces net-positive results? If any of you have actually got it to work, could you share (in detail) how you did it? By "getting it to work" I mean: * creating more value than technical debt, and * producing code that’s structurally sound enough for someone responsible for the architecture to sign off on. Lately I’ve seen a push toward minimal or nonexistent code review, with the claim that we should move from “validating architecture” to “validating behavior.” In practice, this seems to mean: don’t look at the code; if tests and CI pass, ship it. I can’t see how this holds up long-term. My expectation is that you end up with "spaghetti" code that works on the happy path but accumulates subtle, hard-to-debug failures over time. When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced. Last weekend I tried building an iOS app for pet feeding reminders from scratch. I instructed Codex to research and propose an architectural blueprint for SwiftUI first. Then, I worked with it to write a spec describing what should be implemented and how. The first implementation pass was surprisingly good, although it had a number of bugs. Things went downhill fast, however. I spent the rest of my weekend getting Codex to make things work, fix bugs without introducing new ones, and research best practices instead of making stuff up. Although I made it record new guidelines and guardrails as I found them, things didn't improve. In the end I just gave up. I personally can't accept shipping unreviewed code. It feels wrong. The product has to work, but the code must also be high-quality. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | xsh6942 12 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It really depends by what you mean by "it works". A retrospective of the last 6months. I've had great success coding infra (terraform). It at least 10x the generation of easily verifiable and tedious to write code. Results were audited to death as the client was highly regulated. Professional feature dev is hit and miss for sure, although getting better and better. We're nowhere near full agentic coding. However, by reinvesting the speed gains from not writing boilerplate into devex and tests/security, I bring to life much better quality software, maintainable and a boy to work with. I suddenly have the homelab of my dreams, all the ideas previously in the "too long to execute" category now get vibe coded while watching TV or doing other stuff. As an old jaded engineer, everything code was getting a bit boring and repetitive (so many rest APIs). I guess you get the most value out of it when you know exactly what you want. Most importantly though, and I've heard this from a few other seniors: I've found joy in making cool fun things with tech again. I like that new way of creating stuff at the speed of thought, and I guess for me that counts as "it works" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fhd2 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bear in mind that there is a lot of money riding on LLMs leading to cost savings, and development (seen as expensive and a common bottleneck) is a huge opportunity. There are paid (micro) influencer campaigns going on and what not. Also bear in mind that a lot of folks want to be seen as being on the bleeding edge, including famous people. They get money from people booking them for courses and consulting, buying their books, products and stuff. A "personal brand" can have a lot of value. They can't be seen as obsolete. They're likely to talk about what could or will be, more than about what currently is. Money isn't always the motive for sure, people also want to be considered useful, they want to genuinely play around and try and see where things are going. All that said, I think your approach is fine. If you don't inspect what the agent is doing, you're down to faith. Is it the fastest way to get _something_ working? Probably not. Is it the best way to build an understanding of the capabilities and pit falls? I'd say so. This stuff is relatively new, I don't think anyone has truly figured out how to best approach LLM assisted development yet. A lot of folks are on it, usually not exactly following the scientific method. We'll get evidence eventually. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zh3 an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'll bite. Here's my realtime Bluetooth heart rate monitor for linux, with text output and web interface.
This was 100% written by Claude Code, my input was limited to mostly accepting Claude suggestions except a couple of cases where I could make suggestions to speed up development (skipping some tests I knew would work).Particularly interesting because I didn't expect this to work, let along not to write any code. Note that I limited it to pure C with limited dependencies; initial prompt was just to get text output ("Heart Rate 76bpm"), when it got to that point I told Claude to add a web interface followed by creating a realtime graph to show the interface in use. Every file is claude generated. AMA. edit: this was particularly interesting as it had to test against the HRM sensor I was wearing during development, and to cope with bluetooth devices appearing and disappearing all the time. It took about a day for the whole thing and cost around $25. further edit: I am by no means an expert with Claude (haven't even got to making a claude.md file); the one real objective here was to get a working example of using dBus to talk to blueZ in C, something I've failed at (more than once) before. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | resonious 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think one fatal flaw is letting the agent build the app from scratch. I've had huge success with agents, but only on existing apps that were architected by humans and have established conventions and guardrails. Agents are really bad at architecture, but quite good at following suit. Other things that seem to contribute to success with agents are: - Static type systems (not tacked-on like Typescript) - A test suite where the tests cover large swaths of code (i.e. not just unit testing individual functions; you want e2e-style tests, but not the flaky browser kind) With all the above boxes ticked, I can get away with only doing "sampled" reviews. I.e. I don't review every single change, but I do review some of them. And if I find anything weird that I had missed from a previous change, I to tell it to fix it and give the fix a full review. For architectural changes, I plan the change myself, start working on it, then tell the agent to finish. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | al_borland 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A principal engineer at Google posted on Twitter that Claude Code did in an hour what the team couldn’t do in a year. Two days later, after people freaked out, context was added. The team built multiple versions in that year, each had its trade offs. All that context was given to the AI and it was able to produce a “toy” version. I can only assume it had similar trade offs. https://xcancel.com/rakyll/status/2007659740126761033#m My experience has been similar to yours, and I think a lot of the hype is from people like this Google engineer who play into the hype and leave out the context. This sets expectations way out of line from reality and leads to frustration and disappointment. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dagss 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been programming for 20 years, and I've always been under-estimating how long things will take (no, not pressured by anyone to give firm estimates, just talking about informally when prioritizing work order together). The other day I gave an estimate to my co-worker and he said "but how long is it really going to take, because you always finish a lot quicker than you say, you say two weeks and then it takes two days". The LLMs will just make me finish things a lot faster and my gut feel estimation for how long things will take still is not yet taking that into account. (And before people talk about typing speed: No that isn't it at all. I've always been the fastest typer and fastest human developer among my close co-workers.) Yes, I need to review the code and interact with the agent. But it's doing a lot better than a lot of developers I've worked with over the years, and if I don't like the style of the code it takes very few words and the LLM will "get it" and it will improve it.. Some commenters are comparing the LLM to a junior. In some sense that is right in that the work relationship may be the same as towards a (blazingly fast) junior; but the communication style and knowledge area and how few words I can use to describe something feels more like talking to a senior. (I think it may help that latest 10 years of my career a big part of my job was reviewing other people's code, delegating tasks, being the one who knew the code base best and helping others into it. So that means I'm used to delegating not just coding. Recently I switched jobs and am now coding alone with AI.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | iamsaitam 11 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Never forget that the majority of what you see online is biased towards edge framing, being the subject matter incredible or terrible. Just as people curate their online profiles to make their lives appear more "appealing" than they actually are, they do the same curating in other areas. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | defatigable 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I use Augment with Claud Opus 4.5 every day at my job. I barely ever write code by hand anymore. I don't blindly accept the code that it writes, I iterate with it. We review code at my work. I have absolutely found a lot of benefit from my tools. I've implemented several medium-scale projects that I anticipate would have taken 1-2 weeks manually, and took a day or so using agentic tools. A few very concrete advantages I've found: * I can spin up several agents in parallel and cycle between them. Reviewing the output of one while the others crank away. * It's greatly improved my ability in languages I'm not expert in. For example, I wrote a Chrome extension which I've maintained for a decade or so. I'm quite weak in Javascript. I pointed Antigravity at it and gave it a very open-ended prompt (basically, "improve this extension") and in about five minutes in vastly improved the quality of the extension (better UI, performance, removed dependencies). The improvements may have been easy for someone expert in JS, but I'm not. Here's the approach I follow that works pretty well: 1. Tell the agent your spec, as clearly as possible. Tell the agent to analyze the code and make a plan based on your spec. Tell the agent to not make any changes without consulting you. 2. Iterate on the plan with the agent until you think it's a good idea. 3. Have the agent implement your plan step by step. Tell the agent to pause and get your input between each step. 4. Between each step, look at what the agent did and tell it to make any corrections or modifications to the plan you notice. (I find that it helps to remind them what the overall plan is because sometimes they forget...). 5. Once the code is completed (or even between each step), I like to run a code-cleanup subagent that maintains the logic but improves style (factors out magic constants, helper functions, etc.) This works quite well for me. Since these are text-based interfaces, I find that clarity of prose makes a big difference. Being very careful and explicit about the spec you provide to the agent is crucial. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | arjie an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think I have evidence it works for me in that a bunch of unfinished projects suddenly finished themselves and work for me in the way I want them to. So whatever delta there was between my ideas and my execution, it has been closed for me. If I'm being honest, the people who get utility out of this tool don't need any tutorials. The smattering of ideas that people mention is sufficient. The people who don't get utility out of this tool are insistent that it is useless, which isn't particularly inspiring to the kind of person who would write a good tutorial. Consequently, you're probably going to have to pay someone if you want a handholding. And at the end you might believe it wasn't worth it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | wewewedxfgdf 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You fundamentally misunderstand AI assisted coding if you think it does the work for you, or that it gets it right, or that it can be trusted to complete a job. It is an assistant not a team mate. If you think that getting it wrong, or bugs, or misunderstandings, or lost code, or misdirections, are AI "failing", then yes you will fail to understand or see the value. The point is that a good AI assisted developer steers through these things and has the skill to make great software from the chaotic ingredients that AI brings to the table. And this is why articles like this one "just don't get it", because they are expecting the AI to do their job for them and holding it to the standards of a team mate. It does not work that way. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | regularfry 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
So review the code. Our rule is that if your name is on the PR, you own the code; someone else will review it and expect you to be able to justify its contents. And we don't accept AI commits. What this means in workflow terms is that the bottleneck has moved, from writing the code to reviewing it. That's forward progress! But the disparity can be jarring when you have multiple thousands of lines of code generated every day and people are used to a review cycle based on tens or hundreds. Some people try to make the argument that we can accept standards of code from AI that we wouldn't accept from a human, because it's the AI that's going to have to maintain it and make changes. I don't accept that: whether you're human or not it's always possible to produce write-only code, and even if the position is "if we get into difficulty we'll just have the agent rewrite it" that doesn't stop you getting into a tarpit in the first place. While we still have a need to understand how the systems we produce work, we need humans to be able to make changes and vouch for their behaviour, and that means producing code that follows our standards. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sirwhinesalot 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The only approach I've tried that seems to work reasonably well, and consistently, was the following: Make a commit. Give Claude a task that's not particularly open ended, the closer to pure "monkey work" boilerplate nonsense the task is, the better (which is also the sort of code I don't want do deal with myself). Preferably it should be something that only touches a file or two in the codebase unless it is a trivial refactor (like changing the same method call all over the place) Make sure it is set to planning mode and let it come up with a plan. Review the plan. Let it implement the plan. If it works, great, move on to review. I've seen it one-shot some pretty annoying tasks like porting code from one platform to another. If there are obvious mistakes (program doesn't build, tests don't pass, etc.) then a few more iterations usually fix the issue. If there are subtle mistakes, make a branch and have it try again. If it fails, then this is beyond what it can do, abort the branch and solve the issue myself. Review and cleanup the code it wrote, it's usually a lot messier than it needs to be. This also allows me to take ownership of the code. I now know what it does and how it works. I don't bother giving it guidelines or guardrails or anything of the sort, it can't follow them reliably. Even something as simple as "This project uses CMake, build it like this" was repeatedly ignored as it kept trying to invoke the makefile directly and in the wrong folder. This doesn't save me all that much time since the review and cleanup can take long, but it serves a great unblocker. I also use it as a rubber duck that can talk back and documentation source. It's pretty good for that. This idea of having an army of agents all working together on the codebase is hilarious to me. Replace "agents" with "juniors I hired on fiverr with anterograde amnesia" and it's about how well it goes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | victorbjorklund an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have three uses of agentic coding at this time. All save me time. 1) low risk code Let's say that we're building an MVP for something. and at this moment we just wanna get something working to get some initial feedback. So for example, the front-end code is not going to stick around. we just want something there to give a functionality and a feeling but it doesn't have to be perfect. AI is awesome at creating that kind of front-end code that will just live for a short time before it's probably all thrown out. 2) fast iterations and experimentation In the past, if you had to build something and you were thinking, thinking maybe I can try this thing, then you're gonna spend hours or days getting it up and working to find out if it's even a good idea in the first place. but with AI And I find that I can just ask the AI to quickly get a working feature up and I can realize no this is not the best way to do it remove everything thing start over. I could not do that in the past with limited time to spend and they just doing the same thing over and over again with different libraries or different solutions. But with AI, I can do that. and then when you have something that you like you can go back and do it correctly. 3) typing for me. And lastly, even when I write my own code, I don't really write it but I don't use the AI to to say, "hey, build me a to-do app" instead I use it to just give me the building blocks so more like in very advanced snippet tool so I might say "Can you give me a gen server that takes in this and that and returns this and that?" And then of course I review the result. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kristopolous 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For me, the only metric that matters is wall-time between initial idea and when it's solid enough that you don't have to think about it. Agentic coding is very similar to frameworks in this regard: 1. If the alignment is right, you have saved time. 2. If it's not right, it might take longer. 3. You won't have clear evidence of which of these cases applies until changing course becomes too expensive. 4. Except, in some cases, this doesn't apply and it's obvious... Probably.... I have a (currently dormant) project https://onolang.com/ that I need to get back to that tries to balance these exact concerns. It's like half written. Go to the docs part to see the idea. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mvanzoest 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've found it useful for getting features started and fixing bugs, but it depends on the feature. I use Claude Sonnet 4.5 and it usually does a pretty good job on well-known problems like setting up web sockets and drag and drop UIs which would take me much longer to do by hand. It also seems to follow examples well of existing patterns in my codebase like router/service/repository implementations. I've struggled to get it to work well for messy complicated problems like parsing text into structured objects that have thousands of edge cases and in which the complexity gets out of hand very quickly if not careful. In these cases I write almost all the code by hand. I also use it for writing ad-hoc scripts I need to run once and are not safety critical, in which case I use it's code as-is after a cursory review that it is correct. Sometimes I build features I would otherwise be too intimidated to try if doing by hand. I also use it to write tests, but I usually don't like it's style and tend to simplify them a lot. I'm sure my usage will change over time as I refine what works and what doesn't for me. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | edude03 20 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have the same experience despite using claude every day. As an funny anecdote: Someone I know wrote the code and the unit tests for a new feature with an agent. The code was subtly wrong, fine, it happens, but worse the 30 or so tests they added added 10 minutes to the test run time and they all essentially amounted to `expect(true).to.be(true)` because the LLM had worked around the code not working in the tests | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | StrangeSound an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think it depends on what you consider structurally sound. I built https://skipthestamp.co.uk/ this weekend. Not only did I not write any code, I did it all from my phone! I'm in the middle of writing a blog post about the process. This is obviously a _very_ simple website, but in my opinion there's no argument that agentic coding works. You're in control of the sandbox. If you don't set any rules or guidelines to follow, the LLM will produce code you're not happy with. As with anything, it's process. If you're building a feature and it has lots of bugs, there's been a misstep. More testing required before moving onto feature 2. What makes you say "unreviewed code"? Isn't that your job now you're no longer writing it? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | atraac an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In my case, I use JetBrains Junie(it uses various models underneath) and it mostly works fine, but I don't vibe code entire products with it, I just give it easy, neatly defined tasks. Improve readme, re-implement something using the same approach as X, create a script that does Y etc. It's fairly good at making one-off tools I need, we messed up something and need a tool to f.e. fill up db with missing records? I'll just make a console app that does this. We need a simple app that will just run somewhere and do one thing(like listen to new files and insert something to a db). It's perfectly fine for that. I wouldn't trust it with day to day job, features, anything more advanced(I do mostly backend work). I also have to verify thoroughly what it ends up doing, it tends to mess up sometimes but since most of what it does is non-critical, I don't mind. I too don't believe the claims of people who just straight up Claude they way through a codebase to do 'grand' things but I have a small sample. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fotcorn 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I used Claude Opus 4.5 inside Cursor to write RISC-V Vector/SIMD code. Specifically Depthwise Convolution and normal Convolution layers for a CNN. I started out by letting it write a naive C version without intrinsic, and validated it against the PyTorch version. Then I asked it (and two other models, Gemini 3.0 and GPT 5.1) to come up with some ideas on how to make it faster using SIMD vector instructions and write those down as markdown files. Finally, I started the agent loop by giving Cursor those three markdown files, the naive C code and some more information on how to compile the code, and also an SSH command where it can upload the program and test it. It then tested a few different variants, ran it on the target (RISC-V SBC, OrangePI RV2) to check if it improves runtime, and then continue from there. It did this 10 times, until it arrived at the final version. The final code is very readable, and faster than any other library or compiler that I have found so far. I think the clear guardrails (output has to match exactly the reference output from PyTorch, performance must be better than before) makes this work very well. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | everfrustrated 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Learning how to drive the models is a legit skill - and I don't mean "prompt engineering". There are absolutely techniques that help and because things are moving fast there is little established practice to draw from. But it's also been interesting seeing experienced coders struggle - I've found my time as a manager has been more help to me than my time as a coder. How to keep people on task and focused etc is very similar to managing humans. I suspect much of the next 5 years will be people rediscovering existing human and project management techniques and rebranding them as AI something. Some techniques I've found useful recently: - If the agent struggled on something once it's done I'll ask it "you were struggling here, think about what happened and if there are is anything you learned. Put this into a learnings document and reference it in agents.md so we don't get stuck next time" - Plans are a must. Chat to the agent back and forth to build up a common understanding of the problem you want solved. Make sure to say "ask me any follow up questions you think are necessary". This chat is often the longest part of the project - don't skimp on it. You are building the requirements and if you've ever done any dev work you understand how important having good requirements are to the success of the work. Then ask the model to write up the plan into an implementation document with steps. Review this thoroughly. Then use a new agent to start work on it. "Implement steps 1-2 of this doc". Having the work broken down into steps helps to be able to do work more pieces (new context windows). This part is the more mindless part and where you get to catch up on reading HN :) - The GitHub Copilot chat agent is great. I don't get the TUI folks at all. The Pro+ plan is a reasonable price and can do a lot with it (Sonnet, Codex, etc all available). Being able to see the diffs as it works is helpful (but not necessary) to catch problems earlier. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Garlef an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> producing code that’s structurally sound enough for someone responsible for the architecture to sign off on 1. It helps immensely if YOU take responsibility for the architecture. Just tell the agent not only what you want but also how you want it. 2. Refactoring with an agent is fast and cheap. 0. ...given you have good tests --- Another thing: The agents are really good at understanding the context. Here's an example of a prompt for a refactoring task that I gave to codex. it worked out great and took about 15 minutes. It mentions a lot of project specific concepts but codex could make sense of it. """ we have just added a test backdoor prorogate to be used in the core module. let's now extract it and move it into a self-contained entrypoint in the testing module (adjust the exports/esbuilds of the testing module as needed and in accordance with the existing patterns in the core and package-system modules). this entrypoint should export the prorogate and also create its environment refactor the core module to use it from there then also adjust the ui-prototype and package system modules to use this backdoor for cleanup """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | CurleighBraces 23 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I feel there's two contradicting statements in this post. > Is there evidence that agentic coding works? Yes plenty, tons, and growing every single day, people are producing code and tooling that works for them and is providing them value. My linkedin is even starting to show me none-programmers knocking up web front ends for us, and my brother who is a builder is now drawing up requirement specs for software! > Is the code high-quality Only if you are really careful, and constantly have the human in the loop guiding the agent, and it's not easy. This is where domain expertise and experience come in. Do I think it's possible, yes, do I think there are a ton of good examples out there, absolutely not. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nl an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I use agentic coding daily and rarely write any code by hand. Here's what works for me: Spend a lot of time working out plans. If you have a feature, get Claude Opus to build a plan, then ask it "How many github issues should this be", and get it to create those issues. Then for each issue ask it to plan the implementation, then update the issue. Then get it to look at all the issues for the feature and look for inconsistencies. Once this is done, you can add you architectural constraints. If you think one issue looks like it could potentially reinvent something, edit that issue to point it at the existing implementation. Once you are happy with the plan, assign to your agents and wait. Optionally you can watch them - I find this quite helpful because you do see them go offtrack sometimes and can correct. As they finish, run a separate review agent. Again, if you have constraints make sure the agent enforces them. Finally, do an overall review of the feature. This should be initially AI assisted. Don't get frustrated when it does the wrong thing - it will! Just tell it how to do the correct thing, and add that to your AGENTS.md so next time it will do it. Consider adding it to your issue template manually too. In terms of code review, I manually review critical calculations line-by-line, and do a broad sweep review over the rest. That broad sweep review looks for duplicate functionality (which happens a lot) and for bad test case generation. I've found this methodology speeds up the coding task around 5-10x what I could do before. Tasks that were 5-10 days of work are now doable in around 1 day. (Overall my productivity increase is a lot higher because I don't procrastinate dealing with issues I want to avoid). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | borzi 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am working on a game side project where I started to force myself not to look at the code (the foundation was written by hand). I just recently launched a big update, all written with agentic code. I really enjoy it and do believe it's the future...plug for reference: https://thefakeborzi.itch.io/tower-chess/devlog/1321604/the-... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cwoolfe 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hang in there. Yes it is possible; I do it every day. I also do iOS and my current setup is: Cursor + Claude Opus 4.5. You still need to think about how you would solve the problem as an engineer and break down the task into a right-sized chunk of work. i.e. If 4 things need to change, start with the most fundamental change which has no other dependencies. Also it is important to manage the context window. For a new task, start a new "chat" (new agent). Stay on topic. You'll be limited to about five back-and-forths before performance starts to suffer. (cursor shows a visual indicator of this in the for of the circle/wheel icon) For larger tasks, tap the Plan button first, and guide it to the correct architecture you are looking for. Then hit build. Review what it did. If a section of code isn't high-quality, tell Claude how to change it. If it fails, then reject the change. It's a tool that can make you 2 - 10x more productive if you learn to use it well. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jpc0 an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In my experience agentic coding is still bad a context management. I can get amazing results from a chatbot based workflow where I manually provide any context needed, if the agent can happily pull files into context on it's own it tends to pull in too much or completely irrelevant stuff and the quality of the output suffers. It's significantly faster in many cases that having written the code by hand myself. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | linesofcode 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
When you first began learning how to program were you building and shipping apps the next day? No. Agentic programming is a skill-set and a muscle you need to develop just like you did with coding in the past. Things didn’t just suddenly go downhill after an arbitrary tipping point - what happened is you hit a knowledge gap in the tooling and gave up. Reflect on what went wrong and use that knowledge next time you work with the agent. For example, investing the time in building a strong test suite and testing strategy ahead of time which both you and the agent can rely on. Being able to manage the agent and getting quality results on a large, complex codebase is a skill in itself, it won’t happen over night. It takes practice and repetition with these tools to level-up, just like any thing else. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | SatvikBeri 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sure, here are my own examples: * I came up with a list of 9 performance improvement ideas for an expensive pipeline. Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all. It made prototypes that had bad code quality but tested the core ideas. One approach cut the time down by 50%, I rewrote it with better code and it's saved about $6,000/month for my company. * My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier. * When AWS RDS costs spiked one month, I set Claude Code to investigate and it found the reason was a misconfigured backup setting * I'll use Claude to throw together a bunch of visualizations for some data to help me investigate * I'll often give Claude the type signature for a function, and ask it to write the function. It generally gets this about 85% right | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | CJefferson 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I had a fairly big custom Python 2 static website generator ( github.com/csplib/csplib ), which I'd about given up transfering to Python 3, after a couple of aborted attempts. My main issue was that the libraries I was using didn't have Python 3 versions. AN AI managed to do basically the whole transfer. One big help is I said "The website output of the current version should be identical", so I had an easy way to test for correctness (assuming it didn't try cheating by saving the website of course, but that's easy for me to check for) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lukebechtel 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Start with a plan. Get AI to help you make it, and edit. 2. Part of the plan should be automated tests. AI can make these for you too, but you should spot check for reasonable behavior. 3. Use Claude 4.5 Opus 4. Use Git, get the AI to check in its work in meaningful chunks, on its own git branch. 5. Ask the AI to keep am append-only developer log as a markdown file, and to update it whenever its state significantly changes, or it makes a large discovery, or it is "surprised" by anything. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mdavid626 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My colleague coded a feature with Code Claude in a day. The code looks good, also seemingly works. The code was reviewed and pushed out to production. The problem: there is no way, he verified the code in any way. The business logic behind the feature would take probably few days to check for correctness. But if it looks good -> done. Let the customer check it. Of course, he claims “he reviewed it”. It feels to me, we just skip doing half the things proper senior devs did, and claim we’re faster. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | emilecantin 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A loop I've found that works pretty well for bugs is this: - Ask Claude to look at my current in-progress task (from Github/Jira/whatever) and repro the bug using the Chrome MCP. - Ask it to fix it - Review the code manually, usually it's pretty self-contained and easy to ensure it does what I want - If I'm feeling cautious, ask it to run "manual" tests on related components (this is a huge time-saver!) - Ask it to help me prepare the PR: This refers to instructions I put in CLAUDE.md so it gives me a branch name, commit message and PR description based on our internal processes. - I do the commit operations, PR and stuff myself, often tweaking the messages / description. - Clear context / start a new conversation for the next bug. On a personal project where I'm less concerned about code quality, I'll often do the plan->implementation approach. Getting pretty in-depth about your requirements ovbiously leads to a much better plan. For fixing bugs it really helps to tell the model to check its assumptions, because that's often where it gets stuck and create new bugs while fixing others. All in all, I think it's working for me. I'll tackle 2-3 day refactors in an afternoon. But obviously there's a learning curve and having the technical skills to know what you want will give you much better results. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | drcxd 25 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
news.ycombinator.com/item?id=46670279 Recently there was this post which is largely generated by Claude Code. Read it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | thayne 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My experience has been it does pretty well at writing a "rough draft" with sufficiently good instructions (in particular, telling it a general direction of how to implement it, rather than just telling it to what the end goal is). Then maybe do one or two passes at having the agent improve on that draft, then fix the rest by hand. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | matt3D 17 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Your metric for “getting it to work” is wrong. Developing software is a means to an end, not a goal in and of itself. The simplest metric you should be tracking is; has it generated income. In this sense my use of agentic coding has performed very well. People asking for evidence and repos are a little naive to how capitalism works in the real world. If I’m making money on something I’m not going to let you copy it, and I’m sure as hell not going to devalue it by publicising that it was built by AI. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | proc0 20 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My experience is the same. In short, agents cannot plan ahead, or plan at a high level. This means they have a blindspot for design. Since they cannot design properly, it limits the kind of projects that are viable to smaller scopes (not sure exactly how small but in my experience, extremely small and simple). Anything that exceeds this abstract threshold has a good chance of being a net negative, with most of the code being unmantainable, unextensible, and unreliable. Anyone who claims AI is great is not building a large or complex enough app, and when it works for their small project, they extrapolate to all possibilities. So because their example was generated from a prompt, it's incorrectly assumed that any prompt will also work. That doesn't necessarily follow. The reality is that programming is widely underestimated. The perception is that it's just syntax on a text file, but it's really more like a giant abstract machine with moving parts. If you don't see the giant machine with moving parts, chances are you are not going to build good software. For AI to do this, it would require strong reasoning capabilities, that lets it derive logical structures, along with long term planning and simulation of this abstract machine. I predict that if AI can do this then it will be able to do every single other job, including physical jobs as it would be able to reason within a robotic body in the physical world. To summarize, people are underestimating programming, using their simple projects to incorrectly extrapolate to any possible prompt, and missing the hard part of programming which involves building abstract machines that work on first principles and mathematical logic. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | plastic3169 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have started to use it to write small throwaway things. Like write a standalone debug shader that can display all this state on top of this image in real time. Not in a million years would I had spent time to mess with fonts in a shading language or bring in immediate gui framework or such. Codex could oneshot that kind of thing and the blast radius is one file that is not part of the project. Or write a separate python program that implements this core logic and double check my thinking. I am not a professional programmer though. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | never_giveup an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if my understanding is correct, @steipete is using exclusively coding agents, and imo he does a good job (see clawd) https://steipete.me/posts/2025/shipping-at-inference-speed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | JamesTRexx an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've used Mistral via duck.ai for my hobby project a few times to convert C++ code snippets to C, which it did well and made it more clear what the code was actually doing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dostick 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Coding agent is a perfect simulation of a junior developer working under you. Developer that will tell you - “yes I can do that” about any language and any problem and will never ask you any questions trying very hard to appear competent. Your job is to put them in constraints and give granular and clear tasks. Be aware that junior developer has very basic knowledge about architecture. The good is that it does not simulate that part when developer tries shift blame or pin it on you. Because you’re to blame at all times. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jillesvangurp 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
All of this is subjective. What does it mean for code to be high quality? If you can express that in a form that can be easily tested, you can just instruct an agentic coding tool to do something about it. Most of my experience is with codex. Everytime I catch it doing something I don't like, I try to codify it in a skill or in my Agents.md or some test. I've been using codex specifically to work on addressing technical debt in my own code bases. There's a lot of stuff I never got around to fixing that I'm now actually addressing. Because it stopped being a monster project that would take weeks. You can actually nudge a code base in the right direction with agentic coding tools. The same things that make it hard for people to iterate on code bases (complexity, technical debt, poor architectural decisions, etc.) also make it hard for LLMs to work on code bases. So, as soon as you start working on making those things better, you might get better results. If you have a lot of regressions when iterating with an LLM, you don't have good enough regression tests. If code produces runtime type errors, maybe use something with a better type checker that can remove those bugs before they happen. If you see a lot of duplication, tell it to do something about it and/or use code quality tools that flag such issues and tell it to address those issues. This stuff requires a bit of discipline and skill. But they are fixable things. And the usual excuse that you can't be bothered doesn't apply here; just make the coding tools fix this for you. As for evidence, the amount of dollars being spent by well respected people in the industry on these tools is increasing. That might not be the evidence you like but it's a clear indication that people are getting some value out of these tools. I'm definitely getting more predictable results. I find myself merging most proposed changes after a few iterations. The percentage is trending up in the last months. I can only speak for myself. But essentially everybody I know and respect is using this stuff at this point. With very mixed results. But people are getting shit done. I think there are lots of things to improve with these tools. I'd like them to be faster and require less micro management. I'd like them to work across multiple repositories and issue trackers instead of suffering from perpetual tunnel vision. Mostly when I get bad results, it's a context problem. Some of these things are frustrating to fix. But in the end this is about good feedback loops, not about models magically getting what you want. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | furyofantares 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> When I tried using Codex on my existing codebases, with or without guardrails, half of my time went into fixing the subtle mistakes it made or the duplication it introduced. If you want to get good at this, when it makes subtle mistakes or duplicates code or whatever, revert the changes and update your AGENTS.md or your prompt and try again. Do that until it gets it right. That will take longer than writing it yourself. It's time invested in learning how to use these and getting a good setup in your codebase for them. If you can't get it to get it right, you may legitimately have something it sucks at. Although as you iterate might also have some other insights into why it keeps getting it wrong and can maybe change something more substantial about your setup to make it able to get it right. For example I have a custom xml/css UI solution that draws inspiration both from XML and SwiftUI, and it does an OK job of making UIs for it. But sometimes it gets stuck in ways it wouldn't if it was using HTML or some known (and probably higher quality/less buggy) UI library. I noticed it keeps trying things, adding redundant markup to both the xml and css, using unsupported attributes that it thinks should exist (because they do in HTML/CSS), and never cleans up on the way. Some amount of fixing up its context made it noticeably better at this but it still gets stuck and makes a mess when it does. So I made it write a linter and now it uses the linter constantly which keeps it closer to on the rails. Your pet feeding app isn't in this category. You can get a substantial app pretty far these days without running into a brick wall. Hitting a wall that quickly just means you're early on the learning curve. You may have needed to give it more technical guidance from the start, and have it write tests for everything, make sure it makes the app observable to itself in some way so it can see bugs itself and fix them, stuff like that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | baxtr an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I’ve started asking people to show me their products. I don’t get good answers so far. Maybe that’s just my small sample size / bubble of people I interact with. My thinking is that showcasing / interacting with products built by LLMs will tell you a lot about code quality AND maintenance. I mean it’s easy to spin up static websites. It’s a whole another thing to create, maintain and iteratively edit/improve an actual digital product over time. That’s where the cracks will show up. That also might be the core problem of agentic code: it’s fairly fresh so you won’t see products that have been maintained for long. Thus, my current summary is: it’s great for prototyping and probably for specific tasks like test case generation but it’s not something you want to use when working on a multiyear product/project. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | DustinBrett 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been having good results lately/finally with Opus 4.5 in Cursor. It still isn't one-shotting my entire task, but the 90% of the way it gets me is pretty close to what I wanted, which is better than in the past. I feel more confident in telling it to change things without it making it worse. I only use it at work so I can't share anything, but I can say I write less code by hand now that it's producing something acceptable. For sysops stuff I have found it extremely useful, once it has MCP's into all relevant services, I use it as the first place I go to ask what is happening with something specific on the backend. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | avereveard 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Define "works" Easiest way to get value is building tests. These don't ship. You can get value from LLM as an additional layer of linting. Reviews don't ship either. You can use LLM for planning. They can quickly scan across the codebases catch side effects of proposed changes or do gap analysis from the desired state. Argumenting that agentic coding must be on or off seem very limiting. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | kr1shna4garwal 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been using agentic coding tools for the past year and a half, and the pattern I've observed is that they work best when they're treated as a very fast, very knowledgeable junior developer, not completely as "autonomous engineer". When I try to give agents broad architectural tasks, they flounder. When I constrain them to small, well-defined units of work within an existing architecture, they can produce clean, correct code surprisingly often. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | rsyring 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Track Mitsuhiko's work and blog posts: https://news.ycombinator.com/user?id=the_mitsuhiko https://lucumr.pocoo.org/about/ He has an extensive and impressive body of work in Python and Rust pre LLMs. He's now working on his own startup and doing much of it with AI and documenting his journey. I trust his opinions even though I don't use LLMs as much as he does. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | orwin 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I still think it's useful, but you have to make a heavy use of the 'plan' mode. I still ask the new hires to avoid doing more than just the plan (or at most generating tests cases), so they can understand the codebase before generating new code inside. Basically my point of view is that if you don't feel comfortable reviewing your coworkers code, you shouldn't generate code with AI, because you will review it badly and then I will have to catch the bugs and fix it (happened 24 hours ago). If you generate code, you better understand where it can generate side effects. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | julianeon 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
As far as I can tell, there are exactly 3 use cases that have demonstrably worked with AI, in the sense that their stakeholders (not the AI companies, the users) swear it works. 1. training a RAG on support questions for chat or documentation, w/good material 2. people doing GTM work in marketing, for things like email automation 3. people using a combination of expensive tools - Claude + Cursor + something else (maybe n8n, maybe a custom coding service) - to make greenfield apps | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mdavid626 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Any real senior devs here using agentic coding? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jasondigitized 13 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Depending on the risk profile of the project, it absolutely works with amazing productivity gains. And the agent of today is the worst agent if will ever be because tomorrow its going to be even better. I am finding amazing results with the ideate -> explore -> plan -> code -> test loop. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bigcloud1299 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have written a software which I wanted to do so for past 7-8 years within past 3 months. I have over 6000 pages of conversations between me and chatgpt Claude Gemini. And hoping to get patent soon. It consists of over to 260k loc, works well it is architected to support many different industries without much changes but configuration and has very good headed and headless qa coverage. I have spent about 16-18 hours a day because I am so bought into the idea and the outcome I am getting. My patent lawyer suggested getting provisional patent on my work. So for me it works | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jsemrau 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For me its a major change for personal projects. That said, since about 3 months ago VS Code Github Copilot is remarkably stable in working with existing code base and I could implement changes to those projects that would have taken me a substantially longer time. So at least for this use-case its there. Hidden game changers are Gradio/Streamlit for easy UI. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | stavros 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes, agentic coding works and has massive value. No, you can't just deploy code unreviewed. Still takes much less time for me to review the plan and output than write the code myself. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | devalexwells 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have had similar questions, and am still evaluating here. However, I've been increasingly frustrated with the sheer volume of anecdotal evidence from yay and naysayers of LLM-assisted coding. I have personally felt increased productivity at times with it, and frustrations at others. In order to better research, I built (ironically, mostly vibe coded) a tool to run structured "self-experiments" on my own usage of AI. The idea is I've init a bunch of hypotheses I have around my own productivity/fulfillment/results with AI-assisted coding. The tool lets me establish those then run "blocks" where I test a particular strategy for a time period (default 2 weeks). So for example, I might have a "no AI" block followed by a "some AI" block followed by a "full agent all-in AI block". The tool is there to make doing check-ins easier, basically a tiny CLI wrapper around journaling that stays out of my way. It also does some static analysis on commit frequency, code produced, etc. but I haven't fleshed out that part of it much and have been doing manual analysis at the end of blocks. For me this kind of self-tracking has been more helpful than hearsay, since I can directly point to periods where it was working well and try to figure out why or what I was working on. It's not fool-proof, obviously, but for me the intentionality has helped me get clearer answers. Whether those results translate beyond a single engineer isn't a question I'm interested in answering and feels like a variant of developer metrics-black-hole, but maybe we'll get more rigorous experiments in time. The tool open source here (may be bugs, only been using it a few weeks): https://github.com/wellwright-labs/devex | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | colechristensen an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There's an impedance mismatch between some people and LLMs and I think one of the major reasons boils down to having preconceived notions about how it should get things done and being frustrated and disappointed when it doesn't. If you explore how to get the best out of it you can by trying many different kinds of ways to interact with it you'll have much more success. I have had similar problems with colleagues who couldn't abide by others solving problems in ways that they disagreed with and would only be agreeable coworkers if they thought in the same way. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | vessenes 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yep, it works. Like anything getting the most out of these tools is its own (human) skill. With that in mind, a couple of comments - think of the coding agents as personalities with blind spots. A code review by all of them and a synthesis step is a good idea. In fact currently popular is the “rule of 5” which suggests you need the LLM to review five times, and to vary the level of review, e.g. bugs, architecture, structure, etc. Anecdotally, I find this is extremely effective. Right now, Claude is in my opinion the best coding agent out there. With Claude code, the best harnesses are starting to automate the review / PR process a bit, but the hand holding around bugs is real. I also really like Yegge’s beads for LLMs keeping state and track of what they’re doing — upshot, I suggest you install beads, load Claude, run ‘!bd prime’ and say “Give me a full, thorough code review for all sorts of bugs, architecture, incorrect tests, specification, usability, code bugs, plus anything else you see, and write out beads based on your findings.” Then you could have Claude (or codex) work through them. But you’ll probably find a fresh eye will save time, e.g. give Claude a try for a day. Your ‘duplicated code’ complaint is likely an artifact of how codex interacts with your codebase - codex in particular likes to load smaller chunks of code in to do work, and sometimes it can get too little context. You can always just cat the relevant files right into the context, which can be helpful. Finally, iOS is a tough target — I’d expect a few more bumps. The vast bulk of iOS apps are not up on GitHub, so there’s less facility in the coding models. And any front end work doesn’t really have good native visual harnesses set up, (although Claude has the Claude chrome extension for web UIs). So there’s going to be more back and forth. Anyway - if you’re a career engineer, I’d tell you - learn this stuff. It’s going to be how you work in very short order. If you’re a hobbyist, have a good time and do whatever you want. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tom_m 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes. Can I share it? No, sadly. It definitely works - but I think sometimes expectations are too high is all. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | PlatoIsADisease 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Since we are on this topic, how would I make an agent that does this job: I am writing an automation software that interfaces with a legacy windows CAD program. Depending on the automation, I just need a picture of the part. Sometimes I need part thickness. Sometimes I need to delete parts. Etc... Its very much interacting with the CAD system and checking the CAD file or output for desired results. I was considering something that would take screenshots and send it back for checks. Not sure what platforms can do this. I am stumped how Visual Studio works with this, there are a bunch of pieces like servers, agents, etc... Even a how-to link would work for me. I imagine this would be extremely custom. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jaxn 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I have a small-ish vertical SaaS that is used heavily by ~700 retail stores. I have enabled our customer success team to fix bugs using GitHub copilot. I approve the PRs, but they have fixed a surprising number of issues. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 8note 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
try other harnesses than codex. ive had more success with review tools, rather than the agent getting the code quality right the first time. current workflow 1. specs/requirements/design, outputting tasks 2. implementation, outputting code and tests 3. run review scripts/debug loops, outputting tasks 4. implement tasks 5. go back to 3 the quality of specs, tasks, and review scripts make a big difference one of the biggest things that gets the results better is if you can get a feedback loop in from what the app actually does back to the agent. good logs, being able to interact/take screenshots a la playwright etc guidelines and guardrails are best if theyre tools that the agent runs, or that run automatically to give feedback. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tacone 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The way I see it, is that for non-trivial things you have to build your method piece by piece. Then things start to improve. It's a process of... developing a process. Write a good AGENTS.md (or CLAUDE.md) and you'll see that code is more idiomatic. Ask it to keep a changelog. Have the LLM write a plan before starting code. Ask it to ask you questions. Write abstraction layers it (along with the fellow humans of course) can use without messing with the low-level detail every time. In a way you have to develop a framework to guide the LLM behavior. It takes time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | d0100 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My gf manages to get paid using Cursor/Copilot, despite not being able to branch herself out of a loop In my experience Copilots work expertly at CRUD'ing inside a well structured project, and for MVPs in languages you aren't an expert on (Rust, C/C++ in my case) The biggest demerit is that agents are increasingly trying to be "smart" and using powershell search/replace or writing scripts to skimp on tokens, with results that make me unreasonably angry I tried adding i18n to an old react project, and copilot used all my credits + 10 USD because it kept shitting everything up with its maddening, idiotic use if search replace If it had simply ingested each file and modified them only once, it would have been cheaper As you can tell, I am still salty about it | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ikidd 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If you're building something new, stick with languages/problems/projects that have plenty of analogues in the opensource world and keep your context windows small, with small changes. One-shotting an application that is very bespoke and niche is not going to go well, and same goes for working on an existing codebase without a pile of background work on helping the model understand it piece by piece, and then restricting it to small changes in well-defined areas. It's like teaching an intern. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | runjake 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> The product has to work, but the code must also be high-quality. I understand and admire your commitment to code quality. I share similar ideals. But it's 2026 and you're asking for evidence that agentic coding works. You're already behind. I don't think you're going to make it. Your competitors are going to outship you. In most cases, your customers don't care about your code. They only want something that works right. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | entropyneur 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am with you on this, although I was able to ship with Aider before as it uses less autonomous approach than the current wave of agentic tools. I don't even care about abstract code quality. To me code quality means maintainability. If the agents are able to maintain the mess they are spewing out, that's quality code to me. We are decidedly not there yet though. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | traceroute66 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am in the same boat as you. The only positive antigenic coding experience I had was using it as a "translator" from some old unmaintained shell + C code to Go. I gave it the old code, told it to translate to Go. I pre-installed a compiled C binary and told it to validate its work using interop tests. It took about four hours of what the vibecoding lovers call "prompt engineering" but at the end I have to admit it did give me a pretty decent "translation". However for everything else I have tried (and yes, vibecoders, "tried" means very tightly defined tasks) all I have ever got is over-engineered vibecoding slop. The worst part of of it is that because the typical cut-off window is anywhere between 6–18 months prior, you get slop that is full of deprecated code because there is almost always a newer/more efficient way to do things. Even in languages like Go. The difference between an AI-slop answer for Go 1.20 and a human coded Go 1.24/1.25 one can be substantial. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | julianozen 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My main rule is never to commit code you don’t understand because it’ll get away from you. I employ a few tricks: 1- I avoid auto-complete and always try to read what it does before committing. When it is doing something I don’t want, I course correct before it continues 2- I ask the LLM questions about the changes it is making and why. I even ask it to make me HTML schema diagrams of the changes. 3- I use my existing expertise. So I am an expert Swift developer, and I use my Swift knowledge to articulate the style of what I want to see in TypeScript, a language I have never worked in professionally. 4- I add the right testing and build infrastructure to put guardrails on its work. 5- I have an extensive library of good code for it to follow. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | afavour 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I’ve heard coding agents best described as a fleet of junior developers available to you 24/7 and I think that’s about right. With the added downside that they don’t really learn as they go so they will forever be junior developers (until models get better). There are projects where throwing a dozen junior developers at the problem can work but they’re very basic CRUD type things. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | EagnaIonat 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is anecdotal and maybe reflects what other people are seeing. If you know the field you want it to work in, then it can augment what you do very well. Without that they all tend to create hot garbage that looks cool to a layperson. I would also avoid getting it to write the whole thing up front. Creating a project plan and requirements can help ground them somewhat. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cyrusradfar 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've built multiple new apps with it and manage two projects that I wrote. I barely write any code other than frontend, copy, etc. One is a VSCode extension and has thousands of downloads across different flavors of the IDE -- won't plug it here to spare the downvotes ;) Been a developer professionally for nearly 20 years. It is 100% replacing most of the things I used to code. I spend most of my time while it's working testing what it's built to decide on what's next. I also spend way more time on DX of my own setup, improving orchestration, figuring out best practice guidance for the Agent(s), and building reusable tools for my Agents (MCP). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 7777332215 16 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Don't use it myself. But I have a client who uses it. The bugs it creates are pretty funny. Constantly replacing parts of code with broken or completely incorrect things. Making things that previously worked broken. Deleting random things. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ammmir 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
When you have a hammer, everything looks like a nail. Ad nauseam. AI has made it possible for me to build several one-off personal tools in the matter of a couple of hours and has improved my non-tech life as a result. Before, I wouldn't even have considered such small projects because of the effort needed. It's been relieving not to have to even look at code, assuming you can describe your needs in a good prompt. On the other hand, I've seen vibe coded codebases with excessive layers of abstraction and performance issues that came from a possibly lax engineering culture of not doing enough design work upfront before jumping into implementation. It's a classic mistake, that is amplified by AI. Yes, average code itself has become cheap, but good code still costs, and amazing code, well, you might still have an edge there for now, but eventually, accept that you will have to move up the abstraction stack to remain valuable when pitted against an AI. What does this mean? Focus on core software engineering principles, design patterns, and understanding what computer is doing at a low level. Just because you're writing TypeScript doesn't mean you shouldn't know what's happening at the CPU level. I predict the rise in AI slop cleanup consultancies, but they'll be competing with smarter AIs who will clean up after themselves. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nathan_compton 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think of coding agents more like "typing assistants" than programmers. If you know exactly what and how to do what you want, you can ask them to do it with clear instructions and save yourself the trouble of typing the code out. Otherwise, they are bad. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dpcan 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes, constantly. I don’t know what I do differently, but I can get Cursor to do exactly what I want all the time. Maybe it’s because it takes more time and effort, and I don’t connect to GitHub or actual databases, nor do I allow it to run terminal commands 99% of the time. I have instructions for it to write up readme files of everything I need to know about what it has done. I’ve provided instructions and created an allow list of commands so it creates local backups of files before it touches them, and I always proceed through a plan process for any task that is slightly more complicated, followed by plan cleanup, and execution. I’m super specific about my tech stack and coding expectations too. Tests can be hard to prompt, I’ll sometimes just write those up by hand. Also, I’ve never had to pay over my $60 a month pro plan price tag. I can’t figure out how others are even doing this. At any rate, I think the problem appears to be the blind commands of “make this thing, make it good, no bugs” and “this broke. Fix!” I kid you not, I see this all the time with devs. Not at all saying this is what you do, just saying it’s out there. And “high quality code” doesn’t actually mean anything. You have to define what that means to you. Good code to me may be slop to you, but who knows unless it is defined. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | recroad 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Works pretty great for me, especially Spec-driven development using OpenSpec - Cleaner code - Easily 5x speed minimum - Better docs, designs - Focus more on the product than than the mechanics - More time for family | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | highspeedbus 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Honestly, I only use coding agents when I feel too lazy to type lots of boilerplate code. As in "Please write just this one for me". Even still, I take care to review each line produced. The key is making small changes at a time. Otherwise, I type out and think about everything being done when in ‘Flow State’. I don't like the feeling of vibe coding for long periods. It completely changes the way work is done, it takes away agency. On a bit of a tangent, I can't get in Flow State when using agents. At least not as we usually define it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jorgeleo 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I did the same experiment as you, and this is what I learned: https://www.linkedin.com/pulse/concrete-vibe-coding-jorge-va... The bottom line is this: * The developer stop been a developer, and becomes a product designer with high technical skills.
* Agents will behave like junior developers, they can type very fast, and produce something that has a high probability to work. They priority will be to make it work, not maintainability, scalability, etc. Agents can achieve that if you detail how to produce it.
* When I start to work on a product that will be vibe coded, I need to have clear in my head all the user stories, code architecture, the whole system, then I can start to tell the agent what to build, and correct and annotate in the md files the code quality decisions so it remembers them.* Use TDD, ask the agent to create the tests, and then code to the test. Don't correct the bugs, make the agent correct them and explain why that is a bug, specially with code design decisions. Store those in AGENTS.md file at the root of the project. There are more things that can be done to guide the agent, but I need to have clear in an articulable way the direction of the coding. On the other side, I don't worry about implementation details like how to use libraries and APIs that I am not familiar with, the agent just writes and I test. Currently I am working on a product and I can tell you, working no more than 10 hours a week (2 hours here, 3 there, leave the agent working while I am having dinner with family) I am progressing at I would say 5 to 10 times faster than without it. So, yeah it works, but I had to adjust how I do my job. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | saikatsg 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Scaling long-running autonomous coding https://news.ycombinator.com/item?id=46624541 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | damnitbuilds 21 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You are asking two very different questions here. i.e. You are asking a question about whether using agents to write code is net-positive, and then you go on about not reviewing the code agents produce. I suspect agents are often net-positive AND one has to review their code. Just like most people's code. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Kerrick 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes. Over the last month, I've made heavy use of agentic coding (a bit of Junie and Amp, but mostly Antigravity) to ship https://www.ratatui-ruby.dev from scratch. Not just the website... the entire thing. The main library (rubygem) has 3,662 code lines and 9,199 comment lines of production Ruby and 4,933 code lines and 710 comment lines of Rust. There are a further 6,986 code lines and 2,304 comment lines of example applications code using the library as documentation, and 4,031 lines of markdown documentation. Plus, 11,902 code lines and 2,164 comment lines of automated tests. Oh, and 4,250 lines in bin/ and tasks/ but those are lower-quality "internal" automation scripts and apps. The library is good enough that Sidekiq is using it to build their TUI. https://github.com/sidekiq/sidekiq/issues/6898 But that's not all I've built over this timeframe. I'm also a significant chunk of the way through an MVU framework, https://rooibos.run, built on top of it. That codebase is 1,163 code lines and 1,420 comment lines of production Ruby, 4,749 code lines and 521 comment lines of automated tests. I need to add to the 821 code lines 221 comment lines of example application code using the framework as documentation, and to the 2,326 lines of markdown documentation. It's been going so well that the plan is to build out an ecosystem: the core library, an OOP and an FP library, and a set of UI widgets. There are 6,192 lines of markdown in the Wik about it: mailing list archives, AI chat archives, current design & architecture, etc. For context, I am a long-time hobbyist Rubyist but I cannot write Rust. I have very little idea of the quality of the Rust code beyond what static analyzers and my test suite can tell me. It's all been done very much in public. You can see every commit going back to December 22 in the git repos linked from the "Sources" tab here: https://sr.ht/~kerrick/ratatui_ruby/ If you look at the timestamps you'll even notice the wild difference between my Christmas vacation days, and when I went back to work and progress slowed. You can also see when I slowed down to work on distractions like https://git.sr.ht/~kerrick/ramforge/tree and https://git.sr.ht/~kerrick/semantic_syntax/tree. If it keeps going as well as it has, I may be able to rival Charm's BubbleTea and Bubbles by summertime. I'm doing this to give Rubyists the opportunity to participate in the TUI renaissance... but my ultimate goal is to give folks who want to make a TUI a reason to learn Ruby instead of Go or Rust. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | PaulHoule 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Treat it as a pair programmer. Ask it questions like "How do I?", "When I do X, Y happens, why is that?", "I think Z, prove me wrong" or "I want to do P, how do you think we should do it?" Feed it little tasks (30 s-5 min) and if you don't like this or that about the code it gives you either tell it something like
or edit something yourself and say
If you want to use it as a junior dev who gets sent off to do tickets and comes back with a patch three days later that will fail code review be my guest, but I greatly enjoy working with a tight feedback loop. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hahahahhaah 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes. Caveat: can't be pure vibes. Needs ownership, care, review and willingness to git reset and try again when needed. Needs a lot of tests. Cavaet: Greenfield. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dlandis 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Last weekend I tried building an iOS app for pet feeding reminders from scratch. Just start smaller. I'm not sure why people try to jump immediately to creating an entire app when they haven't even gotten any net-positive results at all yet. Just start using it for small time saving activities and then you will naturally figure out how to gradually expand the scope of what you can use it for. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dionian 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I review it as i generate it. for quality. i guide it to be self-testing. create unit tests and integration tests according to my standards | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cypherfox an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Let me give a concrete example. I had a tool I built ten years ago on Rails 5.2. It's decent, mildly complex for a 1-man project, and I wanted to refresh it. Current Rails is 8. I've done upgrades before and it's...rough going more than one version up. It's _such_ a pain to get it right. I pointed Claude Code at it, and a few hours later, it had done all of the hard work. I babysat it, but I was doing other things while it worked. I didn't verify all the code changes (although I did skim the resultant PR, especially for security concerns) but it worked. It rewrote my extensive hand-rolled Coffeescript into modern JavaScript, which was also nice; it did it perfectly. The tests passed, and it even uncovered some issues that I had it fix afterwards. (Places where my security settings weren't as good as they should have been, or edge cases I hadn't thought of ten years ago.) Now could I have done this? Yes, of course. I've done it before with other projects. But it *SUCKS* to do manually. Some folks suggest that you should only use these tools for tasks you COULD do, but would be annoyed to do. I kind of like that metric, but I bet my bar for annoyance will go down over time. My experience with these systems is that they aren't significantly faster, ultimately, but I hate the sucky parts of my job VASTLY less. And there are a lot of sucky parts to even the code-creation side of programming. I *love* my career and have been doing it for 36 years, but like anything that you're very experienced in, you know the parts that suck. Like some others, it helps that my most recent role was Staff Software Engineer, and so I was delegating and looking over the results of other folks work more than hand-rolling code. So the 'suggest and review' pattern is one that I'm very comfortable with, along with clearly separate small-scale plan and execute steps. Ultimately I find these tools reduce cognitive load, which makes me happier when I'm building systems, so I don't care as much if I'm strictly faster. If at the end of the day I made progress and am not exhausted, that's a win. And the LLM coding tools deliver that for me, at least. One of the things I've also had to come to terms with _in large companies_ is that the code is __never__ high quality. If you drill into almost any part of a huge codebase, you're going to start questioning your sanity (obligatory 'Programming Sucks' reference). Whether it's a single complex 750 line C++ function at the heart of a billion dollar payment system, or 2,000 lines in a single authentication function in a major CRM tool, or a microservice with complex deployment rules that just exists to unwrap a JWT, or 13 not-quite-identical date time picker libraries in one codebase, the code in any major system is not universally high quality. But it works. And there are always *very good reasons* why it was built that way. Those are the forces that were on the development team when it was built, and you don't usually know them, and you mustn't be a jerk about it. Many folks new to a team don't get that, and create a lot of friction, only to learn Chesterton's Fence all over again. Coming to terms with this over the course of my career has also made coming to terms with the output of LLMs being functional, but not high quality, easier. I'm sure some folks will call this 'accepting mediocrity' and that's okay. I'd rather ship working code. (_And to be clear, this is excepting security vulnerabilities and things that will lose data. You always review for those kinds of errors, but even for those, reviews are made somewhat easier with LLMs._) N.b. I pay for Claude Code, but I regularly test local coding models on my ML server in my homelab. The local models and tooling is getting surprisingly good...but not there yet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fragmede 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Care to share the pet feeder's code and what the bugs are and how it went off the rails? Seems like a perfect scenario for us to see how much is prompting skill, how much is a setup, how much is just patience for the thing, and how much is hype/lies. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | koakuma-chan 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> The product has to work, but the code must also be high-quality. I think in most cases the speed at which AI can produce code outweighs technical debt, etc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | allisdust 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You need to perturb the token distribution by overlaying multiple passes. Any strategy that does this would work. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mythrwy 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I haven't (yet) tried Claude but have good experiences with Codex CLI the last few weeks. Previously I tried to use Aider and openAI about 6 or 7 months ago and it was terrible mess. I went back to pasting snippets in the browser chat window until a few weeks ago and thought agents were mostly hype (was wrong). I keep a browser chat window open to talk about the project at a higher level. I'll post command line output like `ls` and `cat` to the higher level chat and use Codex strictly for coding. I haven't tried to one shot anything. I just give it a smallish piece of work at a time and check as it goes in a separate terminal window. I make the commits and delete files (if needed) and anything administrative. I don't have any special agent instructions. I do give Codex good hints on where to look or how to handle things. It's probably a bit slower than what some people are doing but it's still very fast and so far has worked well. I'm a bit cautious because of my previous experience with Aider which was like roller skating drunk while juggling open straight razors and which did nothing but make a huge mess (to be fair I didn't spend much time trying to tame it). I'm not sold on Codex or openAI compared to other models and will likely try other agents later, but so far it's been good. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 3vidence 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Googler opinions are my own. If agentic coding worked as well as people claimed on large codebases I would be seeing a massive shift at my Job... Im really not seeing it. We have access to pretty much all the latest and greatest internally at no cost and it still seems the majority of code is still written and reviewed by people. AI assisted coding has been a huge help to everyone but straight up agentic coding seems like it does not scale to these very large codebases. You need to keep it on the rails ALL THE TIME. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | TomWizOverlord 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is 1/3 response to a short prompt about implementation options for GitHub Runner form broken Server to Github Enterprise Cloud: # EC2-Based GitHub Actions Self-Hosted Runners - Complete Implementation ## Architecture Overview This solution deploys auto-scaling GitHub Actions runners on EC2 instances that can trigger your existing AWS CodeBuild pipelines. Runners are managed via Auto Scaling Groups with automatic registration and health monitoring. ## Prerequisites - AWS CLI configured with appropriate credentials - GitHub Enterprise Cloud organization admin access - Existing CodeBuild project(s) - VPC with public/private subnets ## Solution Components ### 1. CloudFormation Template### 2. GitHub Workflow for CodeBuild Integration## Deployment Steps ### Step 1: Create GitHub Personal Access Token 1. Navigate to GitHub → Settings → Developer settings → Personal access tokens → Fine-grained tokens 2. Create token with these permissions: - *Repository permissions:* - Actions: Read and write - Metadata: Read - *Organization permissions:* - Self-hosted runners: Read and write ```bash # Store token securely export GITHUB_PAT="ghp_xxxxxxxxxxxxxxxxxxxx" export GITHUB_ORG="your-org-name" ``` ### Step 2: Deploy CloudFormation Stack ```bash # Set variables export AWS_REGION=us-east-1 export STACK_NAME=github-runner-ec2 export VPC_ID=vpc-xxxxxxxx export SUBNET_IDS="subnet-xxxxxxxx,subnet-yyyyyyyy" # Deploy stack aws cloudformation create-stack \ --stack-name $STACK_NAME \ --template-body file://github-runner-ec2-asg.yaml \ --parameters \ ParameterKey=VpcId,ParameterValue=$VPC_ID \ ParameterKey=PrivateSubnetIds,ParameterValue=\"$SUBNET_IDS\" \ ParameterKey=GitHubOrganization,ParameterValue=$GITHUB_ORG \ ParameterKey=GitHubPAT,ParameterValue=$GITHUB_PAT \ ParameterKey=InstanceType,ParameterValue=t3.medium \ ParameterKey=MinSize,ParameterValue=2 \ ParameterKey=MaxSize,ParameterValue=10 \ ParameterKey=DesiredCapacity,ParameterValue=2 \ ParameterKey=RunnerLabels,ParameterValue="self-hosted,linux,x64,ec2,aws,codebuild" \ ParameterKey=CodeBuildProjectNames,ParameterValue="" \ --capabilities CAPABILITY_NAMED_IAM \ --region $AWS_REGION # Wait for completion (5-10 minutes) aws cloudformation wait stack-create-complete \ --stack-name $STACK_NAME \ --region $AWS_REGION # Get stack outputs aws cloudformation describe-stacks \ --stack-name $STACK_NAME \ --query 'Stacks[0].Outputs' \ --region $AWS_REGION ``` ### Step 3: Verify Runners ```bash # Check Auto Scaling Group ASG_NAME=$(aws cloudformation describe-stacks \ --stack-name $STACK_NAME \ --query 'Stacks[0].Outputs[?OutputKey==`AutoScalingGroupName`].OutputValue' \ --output text) aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --region $AWS_REGION # List running instances aws ec2 describe-instances \ --filters "Name=tag:aws:autoscaling:groupName,Values=$ASG_NAME" \ --query 'Reservations[].Instances[].[InstanceId,State.Name,PrivateIpAddress]' \ --output table # Check CloudWatch logs aws logs tail /github-runner/instances --follow ``` ### Step 4: Verify in GitHub Navigate to: `https://github.com/organizations/YOUR_ORG/settings/actions/r...` You should see your EC2 runners listed as "Idle" with labels: `self-hosted, linux, x64, ec2, aws, codebuild` ## Using One Runner for Multiple Repos & Pipelines ### Organization-Level Runners (Recommended) EC2 runners registered at the organization level can serve all repositories automatically. *Benefits:* - Centralized management - Cost-efficient resource sharing - Simplified scaling - Single point of monitoring *Configuration in CloudFormation:* The template already configures organization-level runners via the UserData script: ```bash ./config.sh --url "https://github.com/${GitHubOrganization}" ... ``` ### Multi-Repository Workflow Examples### Advanced: Runner Groups for Access Control### Label-Based Runner Selection Strategy *Create different runner pools with specific labels:* ```bash # Production runners RunnerLabels: "self-hosted,linux,ec2,production,high-performance" # Development runners RunnerLabels: "self-hosted,linux,ec2,development,general" # Team-specific runners RunnerLabels: "self-hosted,linux,ec2,team-platform,specialized" ``` *Use in workflows:* ```yaml jobs: prod-deploy: runs-on: [self-hosted, linux, ec2, production]
```## Monitoring and Maintenance ### Monitor Runner Health ```bash # Check Auto Scaling Group health aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].[DesiredCapacity,MinSize,MaxSize,Instances[].[InstanceId,HealthStatus,LifecycleState]]' # View instance system logs INSTANCE_ID=$(aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].Instances[0].InstanceId' \ --output text) aws ec2 get-console-output --instance-id $INSTANCE_ID # Check CloudWatch logs aws logs get-log-events \ --log-group-name /github-runner/instances \ --log-stream-name $INSTANCE_ID/runner \ --limit 50 ``` ### Connect to Runner Instance (via SSM) ```bash # List instances aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].Instances[].[InstanceId,HealthStatus]' \ --output table # Connect via Session Manager (no SSH key needed) aws ssm start-session --target $INSTANCE_ID # Once connected, check runner status sudo systemctl status actions.runner. sudo journalctl -u actions.runner.* -f ``` ### Troubleshooting Common Issues## Advanced Scaling Configuration ### Lambda-Based Dynamic Scaling For more sophisticated scaling based on GitHub Actions queue depth:### Deploy Scaling Lambda ```bash # Create Lambda function zip function.zip github-queue-scaler.py aws lambda create-function \ --function-name github-runner-scaler \ --runtime python3.11 \ --role arn:aws:iam::ACCOUNT_ID:role/lambda-execution-role \ --handler github-queue-scaler.lambda_handler \ --zip-file fileb://function.zip \ --timeout 30 \ --environment Variables="{ ASG_NAME=$ASG_NAME, GITHUB_ORG=$GITHUB_ORG, GITHUB_TOKEN=$GITHUB_PAT, MAX_RUNNERS=10, MIN_RUNNERS=2 }" # Create CloudWatch Events rule to trigger every 2 minutes aws events put-rule \ --name github-runner-scaling \ --schedule-expression 'rate(2 minutes)' aws events put-targets \ --rule github-runner-scaling \ --targets "Id"="1","Arn"="arn:aws:lambda:REGION:ACCOUNT:function:github-runner-scaler" ``` ## Cost Optimization ### 1. Use Spot Instances Add to Launch Template in CloudFormation: ```yaml LaunchTemplateData: InstanceMarketOptions: MarketType: spot SpotOptions: MaxPrice: "0.05" # Set max price SpotInstanceType: one-time ``` ### 2. Scheduled Scaling Scale down during off-hours: ```bash # Scale down at night (9 PM) aws autoscaling put-scheduled-action \ --auto-scaling-group-name $ASG_NAME \ --scheduled-action-name scale-down-night \ --recurrence "0 21 * * " \ --desired-capacity 1 # Scale up in morning (7 AM) aws autoscaling put-scheduled-action \ --auto-scaling-group-name $ASG_NAME \ --scheduled-action-name scale-up-morning \ --recurrence "0 7 * MON-FRI" \ --desired-capacity 3 ``` ### 3. Instance Type Mix Use multiple instance types for better availability and cost: ```yaml MixedInstancesPolicy: InstancesDistribution: OnDemandBaseCapacity: 1 OnDemandPercentageAboveBaseCapacity: 25 SpotAllocationStrategy: price-capacity-optimized LaunchTemplate: Overrides: - InstanceType: t3.medium - InstanceType: t3a.medium - InstanceType: t2.medium ``` ## Security Best Practices 1. *No hardcoded credentials* - Using Secrets Manager for GitHub PAT 2. *IMDSv2 enforced* - Prevents SSRF attacks 3. *Minimal IAM permissions* - Scoped to specific CodeBuild projects 4. *Private subnets* - Runners not directly accessible from internet 5. *SSM for access* - No SSH keys needed 6. *Encrypted secrets* - Secrets Manager encryption at rest 7. *CloudWatch logging* - All runner activity logged ## References - [GitHub Self-hosted Runners Documentation](https://docs.github.com/en/actions/hosting-your-own-runners/...) - [GitHub Runner Registration API](https://docs.github.com/en/rest/actions/self-hosted-runners) - [AWS Auto Scaling Documentation](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-i...) - [AWS CodeBuild API Reference](https://docs.aws.amazon.com/codebuild/latest/APIReference/We...) - [GitHub Actions Runner Releases](https://github.com/actions/runner/releases) - [AWS Systems Manager Session Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide...) This solution provides a production-ready, cost-effective EC2-based runner infrastructure with automatic scaling, comprehensive monitoring, and multi-repository support for triggering CodeBuild pipelines. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hakanderyal 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I've been increasingly removing myself from the typing part since August. For the last few months, I haven't written a single line of code, despite producing a lot more. I'm using Claude Code. I've been building software as a solo freelancer for the last 20+ years. My latest workflow - I work on "regular" web apps, C#/.NET on backend, React on web. - I'm using 3-8 sessions in parallel, depending on the tasks and the mental bandwidth I have, all visible on external display. - I've markdown rule files & documentation, 30k lines in total. Some of them describes how I want the agent to work (rule files), some of them describes the features/systems of the app. - Depending on what I'm working on, I load relevant rule files selectively into the context via commands. I have a /fullstack command that loads @backend.md, @frontend.md and a few more. I have similar /frontend, /backend, /test commands with a few variants. These are the load bearing columns of my workflow. Agents takes a lot more time and produces more slop without these. Each one is written by agents also, with my guidance. They evolve based on what we encounter. - Every feature in the app, and every system, has a markdown document that's created by the implementing agent, describing how it works, what it does, where it's used, why it's created, main entry points, main logic, gotchas specific to this feature/system etc. After every session, I have /write-system, /write-feature commands that I use to make the agent create/update those, with specific guidance on verbosity, complexity, length. - Each session I select a specific task for a single system. I reference the relevant rule files and feature/system doc, and describe what I want it to achieve and start plan mode. If there are existing similar features, I ask the agent to explore and build something similar. - Each task is specifically tuned to be planned/worked in a single session. This is the most crucial role of mine. - For work that would span multiple sessions, I use a single session to create the initial plan, then plan each phase in depth in separate sessions. - After it creates the plan, I examine, do a bit of back and forth, then approve. - I watch it while it builds. Usually I have 1-2 main tasks and a few subtasks going in parallel. I pay close attention to main tasks and intervene when required. Subtasks rarely requires intervention due to their scope. - After the building part is done, I go through the code via editor, test manually via UI, while the agent creates tests for the thing we built, again with specific guidance on what needs to be tested and how. Since the plan is pre-approved by me, this step usually goes without a hitch. - Then I make the agent create/update the relevant documents. - Last week I built another system to enhance that flow. I created a /devlog command. With the assist of some CLI tools and cladude log parsing, it creates a devlog file with some metadata (tokens, length, files updated, docs updated etc) and agent fills it with a title, summary of work, key decisions, lessons learned. First prompt is also copied there. These also get added to the relevant feature/system document automatically as changelog entries. So, for every session, I've a clear document about what got done, how long it took, what was the gotchas, what went right, what went wrong etc. This proved to be invaluable even with a week worth of develops, and allows me to further refine my workflows. This looks convoluted at a first glance, but it's evolved over the months and works great. The code quality is almost the same with what I would have written by myself. All because of existing code to use as examples, and the rule files guiding the agents. I was already a fast builder before, but with agents it's a whole new level. And this flow really unlocked with Opus 4.5. Sonnet 3.5/4/4.5 was also working OK, but required a lot more handholding and steering and correction. Parallel sessions wasn't really possible without producing slop. Opus 4.5 is significantly better. More technical/close-to-hardware work will most likely require a different set of guidance & flow to create non-slop code. I don't have any experience there. You need to invest in improving the workflow. The capacity is there in the models. The results all depends on how you use them. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cat_plus_plus 18 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Do not blame the tools? Given a clear description (overall design, various methods to add, inputs, outputs), Google Antigravity often writes better zero shot code than an average human engineer - consistent checks for special cases, local optimizations, extensive comments, thorough text coverage. Now in terms of reviews, the real focus is reviewing your own code no matter which tools you used to write it, vi or agentic AI IDE, not someone else reviewing your code. The later is a safety/mentorship tool in the best circumstances and all too often just an excuse for senior architects to assert their dominance and justify their own existence at the expense of causing unnecessary stress and delaying getting things shipped. Now in terms of using AI, the key is to view yourself as a technical lead, not a people manager. You don't stop coding completely or treat underlying frameworks as a black box, you just do less of it. But at some point fixing a bug yourself is faster than writing a page of text explaining exactly how you want it fixed. Although when you don't know the programming language, giving pseudocode or sample code in another language can be super handy. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lostmsu 19 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It works in the sense that there are lots of professional (as in they earn money from software engineering) developers out there who do the work of exactly same quality. I would even bet they are the majority (or at least were prior to late 2024). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | TomWizOverlord 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
claude response to a query to give options for GitHub runner ... it haas generated 3 more files to review test and make it work: # EC2-Based GitHub Actions Self-Hosted Runners - Complete Implementation ## Architecture Overview This solution deploys auto-scaling GitHub Actions runners on EC2 instances that can trigger your existing AWS CodeBuild pipelines. Runners are managed via Auto Scaling Groups with automatic registration and health monitoring. ## Prerequisites - AWS CLI configured with appropriate credentials - GitHub Enterprise Cloud organization admin access - Existing CodeBuild project(s) - VPC with public/private subnets ## Solution Components ### 1. CloudFormation Template### 2. GitHub Workflow for CodeBuild Integration## Deployment Steps ### Step 1: Create GitHub Personal Access Token 1. Navigate to GitHub → Settings → Developer settings → Personal access tokens → Fine-grained tokens 2. Create token with these permissions: - *Repository permissions:* - Actions: Read and write - Metadata: Read - *Organization permissions:* - Self-hosted runners: Read and write ```bash # Store token securely export GITHUB_PAT="ghp_xxxxxxxxxxxxxxxxxxxx" export GITHUB_ORG="your-org-name" ``` ### Step 2: Deploy CloudFormation Stack ```bash # Set variables export AWS_REGION=us-east-1 export STACK_NAME=github-runner-ec2 export VPC_ID=vpc-xxxxxxxx export SUBNET_IDS="subnet-xxxxxxxx,subnet-yyyyyyyy" # Deploy stack aws cloudformation create-stack \ --stack-name $STACK_NAME \ --template-body file://github-runner-ec2-asg.yaml \ --parameters \ ParameterKey=VpcId,ParameterValue=$VPC_ID \ ParameterKey=PrivateSubnetIds,ParameterValue=\"$SUBNET_IDS\" \ ParameterKey=GitHubOrganization,ParameterValue=$GITHUB_ORG \ ParameterKey=GitHubPAT,ParameterValue=$GITHUB_PAT \ ParameterKey=InstanceType,ParameterValue=t3.medium \ ParameterKey=MinSize,ParameterValue=2 \ ParameterKey=MaxSize,ParameterValue=10 \ ParameterKey=DesiredCapacity,ParameterValue=2 \ ParameterKey=RunnerLabels,ParameterValue="self-hosted,linux,x64,ec2,aws,codebuild" \ ParameterKey=CodeBuildProjectNames,ParameterValue="" \ --capabilities CAPABILITY_NAMED_IAM \ --region $AWS_REGION # Wait for completion (5-10 minutes) aws cloudformation wait stack-create-complete \ --stack-name $STACK_NAME \ --region $AWS_REGION # Get stack outputs aws cloudformation describe-stacks \ --stack-name $STACK_NAME \ --query 'Stacks[0].Outputs' \ --region $AWS_REGION ``` ### Step 3: Verify Runners ```bash # Check Auto Scaling Group ASG_NAME=$(aws cloudformation describe-stacks \ --stack-name $STACK_NAME \ --query 'Stacks[0].Outputs[?OutputKey==`AutoScalingGroupName`].OutputValue' \ --output text) aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --region $AWS_REGION # List running instances aws ec2 describe-instances \ --filters "Name=tag:aws:autoscaling:groupName,Values=$ASG_NAME" \ --query 'Reservations[].Instances[].[InstanceId,State.Name,PrivateIpAddress]' \ --output table # Check CloudWatch logs aws logs tail /github-runner/instances --follow ``` ### Step 4: Verify in GitHub Navigate to: `https://github.com/organizations/YOUR_ORG/settings/actions/r...` You should see your EC2 runners listed as "Idle" with labels: `self-hosted, linux, x64, ec2, aws, codebuild` ## Using One Runner for Multiple Repos & Pipelines ### Organization-Level Runners (Recommended) EC2 runners registered at the organization level can serve all repositories automatically. *Benefits:* - Centralized management - Cost-efficient resource sharing - Simplified scaling - Single point of monitoring *Configuration in CloudFormation:* The template already configures organization-level runners via the UserData script: ```bash ./config.sh --url "https://github.com/${GitHubOrganization}" ... ``` ### Multi-Repository Workflow Examples### Advanced: Runner Groups for Access Control### Label-Based Runner Selection Strategy *Create different runner pools with specific labels:* ```bash # Production runners RunnerLabels: "self-hosted,linux,ec2,production,high-performance" # Development runners RunnerLabels: "self-hosted,linux,ec2,development,general" # Team-specific runners RunnerLabels: "self-hosted,linux,ec2,team-platform,specialized" ``` *Use in workflows:* ```yaml jobs: prod-deploy: runs-on: [self-hosted, linux, ec2, production]
```## Monitoring and Maintenance ### Monitor Runner Health ```bash # Check Auto Scaling Group health aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].[DesiredCapacity,MinSize,MaxSize,Instances[].[InstanceId,HealthStatus,LifecycleState]]' # View instance system logs INSTANCE_ID=$(aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].Instances[0].InstanceId' \ --output text) aws ec2 get-console-output --instance-id $INSTANCE_ID # Check CloudWatch logs aws logs get-log-events \ --log-group-name /github-runner/instances \ --log-stream-name $INSTANCE_ID/runner \ --limit 50 ``` ### Connect to Runner Instance (via SSM) ```bash # List instances aws autoscaling describe-auto-scaling-groups \ --auto-scaling-group-names $ASG_NAME \ --query 'AutoScalingGroups[0].Instances[].[InstanceId,HealthStatus]' \ --output table # Connect via Session Manager (no SSH key needed) aws ssm start-session --target $INSTANCE_ID # Once connected, check runner status sudo systemctl status actions.runner. sudo journalctl -u actions.runner.* -f ``` ### Troubleshooting Common Issues## Advanced Scaling Configuration ### Lambda-Based Dynamic Scaling For more sophisticated scaling based on GitHub Actions queue depth:### Deploy Scaling Lambda ```bash # Create Lambda function zip function.zip github-queue-scaler.py aws lambda create-function \ --function-name github-runner-scaler \ --runtime python3.11 \ --role arn:aws:iam::ACCOUNT_ID:role/lambda-execution-role \ --handler github-queue-scaler.lambda_handler \ --zip-file fileb://function.zip \ --timeout 30 \ --environment Variables="{ ASG_NAME=$ASG_NAME, GITHUB_ORG=$GITHUB_ORG, GITHUB_TOKEN=$GITHUB_PAT, MAX_RUNNERS=10, MIN_RUNNERS=2 }" # Create CloudWatch Events rule to trigger every 2 minutes aws events put-rule \ --name github-runner-scaling \ --schedule-expression 'rate(2 minutes)' aws events put-targets \ --rule github-runner-scaling \ --targets "Id"="1","Arn"="arn:aws:lambda:REGION:ACCOUNT:function:github-runner-scaler" ``` ## Cost Optimization ### 1. Use Spot Instances Add to Launch Template in CloudFormation: ```yaml LaunchTemplateData: InstanceMarketOptions: MarketType: spot SpotOptions: MaxPrice: "0.05" # Set max price SpotInstanceType: one-time ``` ### 2. Scheduled Scaling Scale down during off-hours: ```bash # Scale down at night (9 PM) aws autoscaling put-scheduled-action \ --auto-scaling-group-name $ASG_NAME \ --scheduled-action-name scale-down-night \ --recurrence "0 21 * * " \ --desired-capacity 1 # Scale up in morning (7 AM) aws autoscaling put-scheduled-action \ --auto-scaling-group-name $ASG_NAME \ --scheduled-action-name scale-up-morning \ --recurrence "0 7 * MON-FRI" \ --desired-capacity 3 ``` ### 3. Instance Type Mix Use multiple instance types for better availability and cost: ```yaml MixedInstancesPolicy: InstancesDistribution: OnDemandBaseCapacity: 1 OnDemandPercentageAboveBaseCapacity: 25 SpotAllocationStrategy: price-capacity-optimized LaunchTemplate: Overrides: - InstanceType: t3.medium - InstanceType: t3a.medium - InstanceType: t2.medium ``` ## Security Best Practices 1. *No hardcoded credentials* - Using Secrets Manager for GitHub PAT 2. *IMDSv2 enforced* - Prevents SSRF attacks 3. *Minimal IAM permissions* - Scoped to specific CodeBuild projects 4. *Private subnets* - Runners not directly accessible from internet 5. *SSM for access* - No SSH keys needed 6. *Encrypted secrets* - Secrets Manager encryption at rest 7. *CloudWatch logging* - All runner activity logged ## References - [GitHub Self-hosted Runners Documentation](https://docs.github.com/en/actions/hosting-your-own-runners/...) - [GitHub Runner Registration API](https://docs.github.com/en/rest/actions/self-hosted-runners) - [AWS Auto Scaling Documentation](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-i...) - [AWS CodeBuild API Reference](https://docs.aws.amazon.com/codebuild/latest/APIReference/We...) - [GitHub Actions Runner Releases](https://github.com/actions/runner/releases) - [AWS Systems Manager Session Manager](https://docs.aws.amazon.com/systems-manager/latest/userguide...) This solution provides a production-ready, cost-effective EC2-based runner infrastructure with automatic scaling, comprehensive monitoring, and multi-repository support for triggering CodeBuild pipelines. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fourthrigbt 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I’ve had a major conversion on this topic within the last month. I’m not exactly a typical SWE at the moment. The role I’m in is a lot of meeting with customers, understand their issues, and whip up demos to show how they might apply my company’s products to their problem. So I’m not writing production code, but I am writing code that I want to to be maintainable and changeable so I can stash a demo for a year and then spin it up quickly when someone wants to see if or update/adapt it as products/problems change. Most of my career has been spent writing aircraft SW so I am heavily biased toward code quality and assurance. The demos I am building are not trivial or common in the training data. They’re highly domain specific and pretty niche, performance is very important, and usually span low level systems code all the way up to a decent looking gui. As a made up example, it wouldn’t be unusual for me to have a project to write a medical imaging pipeline from scratch that employs modern techniques from recent papers, etc. Up until very recently, I only thought coding agents were useful for basic crud apps, etc. I said the same things a lot of people on this thread are saying, eg. people on twitter are all hype, their experience doesn’t match mine, they must be working on easy problems or be really bad at writing code I recently decided to give into the hype and really try to use the tooling and… it’s kind of blown my mind. Cursor + opus 4.5 high are my main tools and their ability to one shot major changes across many files and hundreds of lines of code, encompassing low level systems stuff, GOU accelerated stuff, networking, etc. It’s seriously altering my perception of what software engineering is and will be and frankly I’m still kind of recoiling from it. Don’t get me wrong, I don’t believe it fundamentally eliminates the need for SWEs, it still takes a lot of work on my part to come up with a spec (though I do have it help me with that part), correct things that I don’t like in its planning or catch it doing the wrong thing in real time in and re direct it. And it will make strange choices that I need to correct on the back end sometimes. But it has legitimately allowed me to build 10x faster than I probably could on my own. Maybe the most important thing about it is what it enables you to build that would have been not worth the trouble before, Stuff like wrapping tools in really nice flexible TUIs, creating visualizations/dashboards/benchmark, slightly altering how an application works to cover a use case you hadn’t thought of before, wrapping an interface so it’s easy to swap libs/APIs later, etc. If you are still skeptical, I would highly encourage you to immerse yourself in the SOTS tools right now and just give in to the hype for a bit, because I do think we’re rapidly going to reach a point here where if you aren’t using these tools you won’t be employable. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | vonneumannstan 17 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Claude Cowork was apparently completely written by Claude Code. So this appears to yet again be a skill issue. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lostsock 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I’m honestly kind of amazed that more people aren’t seeing the value, because my experience has been almost the opposite of what you’re describing. I agree with a lot of your instincts. Shipping unreviewed code is wrong. “Validate behavior not architecture” as a blanket rule is reckless. Tests passing is not the same thing as having a system you can reason about six months later. On that we’re aligned. Where I diverge is the conclusion that agentic coding doesn’t produce net-positive results. For me it very clearly does, but perhaps it's very situation or condition dependent? For me, I don’t treat the agent as a junior engineer I can hand work to and walk away from. I treat it more like an extremely fast, extremely literal staff member who will happily do exactly what you asked, including the wrong thing, unless you actively steer it. I sit there and watch it work (usually have 2-3 agents working at the same time, ideally on different codebases but sometimes they overlap). I interrupt it. I redirect it. I tell it when it is about to do something dumb. I almost never write code anymore, but I am constantly making architectural calls. Second, tooling and context quality matter enormously. I’m using Claude Code. The MCP tools I have installed make a huge different: laravel-boost, context7, and figma (which in particular feels borderline magical at converting designs into code!). I often have to tell the agent to visit GitHub READMEs and official docs instead of letting it hallucinate “best practices”, the agent will oftentimes guess and get stack, so if it's doing that, you’ve already lost. Third, I wonder if perhaps starting from scratch is actually harder than migrating something real. Right now I’m migrating a backend from Java to Laravel and rebuilding native apps into KMP and Compose Multiplatform. So the domain and data is real and I can validate against a previous (if buggy) implimentation). In that environment, the agent is phenomenal. It understands patterns, ports logic faithfully, flags inconsistencies, and does a frankly ridiculous amount of correct work per hour. Does it make mistakes? Of course. But they’re few and far between, and they’re usually obvious at the architectural or semantic level, not subtle landmines buried in the code. When something is wrong, it’s wrong in a way that’s easy to spot if you’re paying attention. That’s the part I think gets missed. If you ask the agent to design, implement, review, and validate itself, then yes, you’re going to get spaghetti with a test suite that lies to you. If instead you keep architecture and taste firmly in human hands and use the agent as an execution engine, the leverage is enormous. My strong suspicion is that a lot of the negative experiences come from a mismatch between expectations and operating model. If you expect the agent to be autonomous, it will disappoint you. If you expect it to be an amplifier for someone who already knows what “good” looks like, it’s transformative. So while I guess plenty of hype exists, for me at least, they hype is justified. I’m shipping way (WAY!) more, with better consistency, and with less cognitive exhaustion than ever before in my 20+ years of doing dev work. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dmitrygr 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. give it toy assignment which is a simplified subcomponent of your actual task 2. wait 3. post on LinkedIn about how amazing AI now is 4. throw away the slop and write proper code 5. go home, to repeat this again tomorrow | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nickphx 18 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
lol no. it is all fomo clown marketing. they make outlandish claims and all fall short of producing anything more than noise. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||