| ▲ | rtpg 5 days ago |
| I understand the pitch here ("it finds bugs! it's basically all upside because worst case there's no output anyways"), but I'm finding some of these agents to be ... uhhh... kind of agressive at trying to find the solution and end up missing the forest for the trees. And there's some "oh you should fix this" stuff which, while sometimes isn't _wrong_, is completely besides the point. The end result being these robots doing bikeshedding. When paired with junior engineers looking at this output and deciding to act on it, it just generates busywork. Not helping that everyone and their dog wants to automatically run their agent against PRs now I'm trying to use these to some extent when I find myself in a canonical situation that should work and am not getting the value everyone else seems to get in many cases. Very much "trying to explain a thing to a junior engineer taking more time than doing it myself" thing, except at least the junior is a person. |
|
| ▲ | joshvm 5 days ago | parent | next [-] |
| When models start to forage around in the weeds, it's a good idea to restart the session and add more information to the prompt for what it should ignore or assume. For example in ML projects, Claude gets very worried that datasets aren't available or are perhaps responsible. Usually if you tell it outright where you suspect the bug to be (or straight up tell it, even if you're unsure) it will focus on that. Or, make it give you a list of concerns and ask you which are valid. I've found that having local clones of large library repos (or telling it to look in the environment for packages) is far more effective than relying on built-in knowledge or lousy web search. It can also use ast-grep on those. For some reason the agent frameworks are still terrible about looking up references in a sane way (where in an IDE you would simply go to declaration). |
| |
| ▲ | theshrike79 5 days ago | parent | next [-] | | Context7 MCP is the one I keep enabled for all sessions. Then there are MCPs that give LSP access to the models as well as tools like Crush[0] that have LSPs built in. [0] https://github.com/charmbracelet/crush | |
| ▲ | embedding-shape 4 days ago | parent | prev | next [-] | | Yeah, I do the same too, cloning reference repos into known paths, tell it to look there if unsure. Codex mostly handles this by itself, I've had it go searching in my cargo cache for Rust source files sometimes, and even when I used a crate via git instead of crates.io, it went ahead and cloned the repo to /tmp to inspect it properly. Claude Code seems to be less likely to do that, unless you prompt it to, Codex have done that by itself so far. | |
| ▲ | shadyKeystrokes 5 days ago | parent | prev [-] | | [dead] |
|
|
| ▲ | Wowfunhappy 5 days ago | parent | prev | next [-] |
| Sometimes you hit a wall where something is simply outside of the LLM's ability to handle, and it's best to give up and do it yourself. Knowing when to give up may be the hardest part of coding with LLMs. Notably, these walls are never where I expect them to be—despite my best efforts, I can't find any sort of pattern. LLMs can find really tricky bugs and get completely stuck on relatively simple ones. |
| |
| ▲ | ori_b 5 days ago | parent | next [-] | | Doing it yourself is how you build and maintain the muscles to do it yourself. If you only do it yourself when the LLM fails, how will you maintain those muscles? | | |
| ▲ | Wowfunhappy 5 days ago | parent | next [-] | | I agree, and I can actively feel myself slipping (and perhaps more critically, not learning new skills I would otherwise have been forced to learn). It's a big problem, but somewhat orthogonal to "what is the quickest way to solve the task currently in front of me." | | |
| ▲ | kryogen1c 5 days ago | parent | next [-] | | > but somewhat orthogonal to "what is the quickest way to solve the task currently in front of me." That depends on if you ignore the future. You are never just solving the problem in front of you; you should always act in a way that propagates positivity forward in time. | | |
| ▲ | dcow 4 days ago | parent [-] | | Some jobs require investment in the future. Some do not. That’s just reality. Not white how I feel about it personally, but I think there is a fair amount of the developer trade that is operational. |
| |
| ▲ | ori_b 5 days ago | parent | prev | next [-] | | Which needs to be balanced with "How do I maintain my ability to keep solving tasks quickly?" | |
| ▲ | AbstractH24 4 days ago | parent | prev [-] | | The thing i struggle with is I feel like it’s hard to lock into which skill to learn properly. Which so much changing so quickly and it becoming easy to learn things superficially. |
| |
| ▲ | RA_Fisher 4 days ago | parent | prev | next [-] | | By moving up a level in the abstraction layer similar to moving from Assembly to C++ to Python (to LLM). There’s speed in delegation (and checking as beneficial). | | |
| ▲ | ThrowawayR2 4 days ago | parent [-] | | Moving up abstraction layers really only succeeds with a solid working knowledge of the lower layers. Otherwise, you're just flying blind, operating on faith. A common source of bugs is precisely a result of developers failing to understand the limits of the abstractions they are using. | | |
| ▲ | RA_Fisher 4 days ago | parent | next [-] | | We only need to do that when it’s practical for the task at hand. Some tasks are life-and-death, but many have much lower stakes. | |
| ▲ | AbstractH24 4 days ago | parent | prev [-] | | So we can all only succeed if we know how CPUs handle individual instructions? | | |
| ▲ | Wowfunhappy 4 days ago | parent | next [-] | | I'm not sure whether I agree with GP, but I think you may be misinterpreting their point. I can have an understanding of CPUs in general without knowing individual instructions, and I do think knowing about things like CPU cache is useful even when writing e.g. Python. | | |
| ▲ | jama211 4 days ago | parent | next [-] | | Sure, but the comment being worried about a lack of “flexing your muscles” is perfectly countered by moving up an abstraction layer then, as you don’t have to constantly get into the weeds of coding to maintain an understanding _in general_ without knowing individual instructions. | |
| ▲ | AbstractH24 4 days ago | parent | prev | next [-] | | I see what you’re getting at and it makes sense. Goes to the larger idea that strategic and logic is important for scalability and long term success. Not just execution. Something LLMs miss often (mostly because people fail to communicate it to them). | |
| ▲ | RA_Fisher 4 days ago | parent | prev [-] | | Yes, for sure! And being able to orchestrate AI to use that knowledge provides leverage for fulfilling tasks. Eventually, yes, I think we'll delegate to AI in more and more complete ways, but it's a process that takes some time. |
| |
| ▲ | monocasa 4 days ago | parent | prev [-] | | There's generally a pretty quick falloff of how much help knowledge of each layer under you generally provides as you go deeper. That being said, if you're writing in C, having a pretty good idea of how a cpu generally executes instructions is pretty key to success I'd say. | | |
| ▲ | AbstractH24 3 days ago | parent [-] | | Agreed, also depends on the scale you are working at. If you are a tiny startup, the marginal gains from these optimizations matter a lot less than if you are Netflix. |
|
|
|
| |
| ▲ | Klathmon 5 days ago | parent | prev [-] | | If the LLM is able to handle it why do you need to maintain those specific skills? | | |
| ▲ | ribosometronome 5 days ago | parent [-] | | Should we not teach kids math because calculators can handle it? Practically, though, how would someone become good at just the skills LLMs don't do well? Much of this discussion is about how that's difficult to predict, but even if you were a reliable judge of what sort of coding tasks LLMs would fail at, I'm not sure it's possible to only be good at that without being competent at it all. | | |
| ▲ | jjmarr 5 days ago | parent | next [-] | | > Should we not teach kids math because calculators can handle it? We don't teach kids how to use an abacus or a slide rule. But we teach positional representations and logarithms. The goal is theoretical concepts so you can learn the required skills if necessary. The same will occur with code. You don't need to memorize the syntax to write a for loop or for each loop, but you should understand when you might use either and be able to look up how to write one in a given language. | | |
| ▲ | ori_b 5 days ago | parent [-] | | Huh. I was taught how to use both an abacus and a slide rule as a kid, in the 90s. |
| |
| ▲ | Klathmon 5 days ago | parent | prev | next [-] | | Should you never use a calculator because you want to keep your math skills high? There are a growing set of problems which feel like using a calculator for basic math to me. But also school is a whole other thing which I'm much more worried about with LLMs. Because there's no doubt in my mind I would have abused AI every chance I got if it were around when I was a kid, and I wouldn't have learned a damn thing. | | |
| ▲ | ori_b 5 days ago | parent [-] | | I don't use calculators for most math because punching it in is slower than doing it in my head -- especially for fermi calculations. I will reach for a calculator when it makes sense, but because I don't use a calculator for everything, the number of places where I'm faster than a calculator grows over time. It's not particularly intentional, it just shook out that way. And I hated mental math exercises as a kid. | | |
| ▲ | johnisgood 5 days ago | parent [-] | | I do not trust myself, so even if I know how to do mental math, I still use my computer or a calculator just to be sure I got it correct. OCD? Lack of self-trust? No clue. | | |
|
| |
| ▲ | Wowfunhappy 5 days ago | parent | prev [-] | | > I'm not sure it's possible to only be good at that without being competent at it all. This is, in fact, why we teach kids math that calculators could handle! |
|
|
| |
| ▲ | rtpg 5 days ago | parent | prev [-] | | Sure, I agree with the "levels of automation" thought process. But I'm basically experiencing this from the start. If at the first step I'm already dealing with a robot in the weeds, I will have to spend time getting it out of the weeds, all for uncertain results afterwards. Now sometimes things are hard and tricky, and you might still save time... but just on an emotional level, it's unsatisfying |
|
|
| ▲ | solumunus 5 days ago | parent | prev | next [-] |
| Communication with a person is more difficult and the feedback loop is much, much longer. I can almost instantly tell whether Claude has understood the mission or digested context correctly. |
|
| ▲ | bontaq 4 days ago | parent | prev | next [-] |
| I would say a lot of people are only posting their positive experiences. Stating negative things about AI is mildly career-dangerous at the moment where as the opposite looks good. I found the results from using it on a complicated code base are similar to yours, but it is very good at slapping things on until it works. If you're not watching it like a hawk it will solve a problem in a way that is inconsistent and, importantly, not integrated into the system. Which makes sense, it's been trained to generate code, and it will. |
|
| ▲ | embedding-shape 4 days ago | parent | prev | next [-] |
| > I understand the pitch here ("it finds bugs! it's basically all upside because worst case there's no output anyways"), but I'm finding some of these agents to be ... uhhh... kind of agressive at trying to find the solution and end up missing the forest for the trees. And there's some "oh you should fix this" stuff which, while sometimes isn't _wrong_, is completely besides the point. How long/big do your system/developer/user prompts end up being typically? The times people seem to be getting "less than ideal" responses from LLMs tend to be when they're not spending enough time setting up a general prompt they can reuse, describing exactly what they want and do not want. So in your case, you need to steer it to do less outside of what you've told it. Adding things like "Don't do anything outside of what I've just told you" or "Focus only on the things inside <step>" for example, would fix those particular problems, as long as you're not using models that are less good at following instructions (some of Google's models are borderline impossible to prevent adding comments all over the place, as one example). So prompt it to not care about solutions, and only care about finding the root cause, and you'll find that you can mostly avoid the annoying parts by either prescribing what you'd want instead, or just straight up tell it not to do those things. Then you iterate on this reusable prompt across projects, and it builds up so eventually 99% of the times the models do exactly what you expect. |
|
| ▲ | MattGaiser 5 days ago | parent | prev | next [-] |
| Just ask it to prioritize the top ones for your review. Yes, they can bikeshed, but because they don’t have egos, they don’t stick to it. Alternatively, if it is in an area with good test coverage, let it go fix the minor stuff. |
| |
| ▲ | rtpg 5 days ago | parent [-] | | I don't like their fixes, so now I'm dealing with imperfect fixes to problems I don't care about. Tedium |
|
|
| ▲ | j2kun 5 days ago | parent | prev | next [-] |
| > except at least the junior is a person. +1 Juniors can learn over time. |
|
| ▲ | SV_BubbleTime 5 days ago | parent | prev | next [-] |
| Ok, fair critique. EXCEPT… What did you have for AI three years ago? Jack fucking shit is what. Why is “wow that’s cool, I wonder what it’ll turn into” a forbidden phrase, but “there are clearly no experts on this topic but let me take a crack at it!!” important for everyone to comment on? One word: Standby. Maybe that’s two words. |
| |
| ▲ | advael 5 days ago | parent | next [-] | | With all due respect, "wow this is cool, I wonder what it'll turn into" is basically the mandatory baseline stance to take. I'm lucky that's where I'm still basically at, because anyone in a technical position who shows even mild reticence beyond that is likely to be unable to hold a job in the face of their bosses' frothing enthusiastic optimism about these technologies | | |
| ▲ | dns_snek 5 days ago | parent [-] | | Is it that bad out there? Yeah, I don't think I could last in a job that tries to force these tools into my workflow. | | |
| ▲ | bravetraveler 5 days ago | parent | next [-] | | Drive-by comment: it's not so bad, here. I work with a few peers who've proven to be evangelists with much stronger social skills. When the proposition comes up, I ask how my ass should be cleaned, too. Thankfully: the bosses haven't heard/don't care. Varying degrees of 'force' at play; I'm lucky that nobody significant is minding my [absence of] LLM usage. Just some peers excited to do more for the same or, arguably, less reward. Remember: we're now all in an arms race. Some of us had a head start. How crass I respond to the suggestion depends on their delivery/relevance in my process/product, of course. May be placated like a child with a new toy... or the gross question to, hopefully, express the suggestion isn't wanted, needed, or welcome. Faced with a real mandate, I'd feed it garbage while looking for new work. Willing to make the bet I can beat enough machines while people are still involved at all. | |
| ▲ | advael 4 days ago | parent | prev [-] | | You can get pretty far by mostly just claiming to use it "when it makes sense" but you do meet people who are very pushy about it. Hoping that calms down as knowledge of the downsides becomes more widespread |
|
| |
| ▲ | j2kun 5 days ago | parent | prev [-] | | Careful there, ChatGPT was initially released November 30, 2022, which was just about 3 years ago, and there were coding assistants before that. If you find yourself saying the same thing every year and adding 1 to the total... |
|
|
| ▲ | adastra22 5 days ago | parent | prev [-] |
| So you feed the output into another LLM call to re-evaluate and assess, until the number of actual reports is small enough to be manageable. Will this result in false negatives? Almost certainly. But what does come out the end of it has a higher prior for being relevant, and you just review what you can. Again, worst case all you wasted was your time, and now you've bounded that. |