| ▲ | starkparker 12 hours ago | |
some models and harnesses are good at some things and bad at others. some HN users are good at using some models and harnesses and bad at using others. some engineers at your company understand what tools to use for their work and others don't. some programming languages are better resourced to support agents than others. most generative tools are good at tasks and bad at architecture. effective QA of generated code is still at an infant stage no matter what anyone claims, much less automated effective QA; teams that care deeply about getting things right the first time will have a worse time with it on the median than teams that don't. don't trust the code it generates, but "don't trust" doesn't mean discard it. don't trust the architecture it invents for a problem, it can't reason at that level, it's literally aping the background noise of the entire internet and building things that are dead-center mediocre. there's not enough specifics in your wall of text to help me point out what's going on in any useful detail, though. i will say that HN users are more likely to be ecstatic at building an MVP that they never have to support; the scale of a company where you have 6 years experience but are still new on your team is bigger than where most of HN lives and works. the dissonance would be the same if it was 10 years ago, LLMs didn't meaningfully exist, and you'd be on here asking HN why some of these so-called lean teams everyone's posting about all of a sudden seem to be so much more productive than when your boss at BiggerCorp tried to streamline processes and then got yelled at by CS, sales enablement, and marketing. | ||
| ▲ | didigamma 11 hours ago | parent [-] | |
Thanks for the perspective! I guess I don't interact with the wider dev world IRL since almost all my people are in non-tech fields. > there's not enough specifics in your wall of text to help me point out what's going on in any useful detail, though. Sorry, wasn't sure what was most valuable to people reading this. Some examples: 1. It feels like getting quality code (which our team cares about alot) is hard to get out of LLMs if it's not just piping data around or copying an example I explicitly point to. It doesn't get the scale to build with - often it does weirdly generic code that's not really relevant, or wants to write alot of boilerplate that's not useful (e.g. spamming tests). 2. It hallucinates documentation often enough I've gone back to hunting for source documentation myself (e.g. AWS details). 3. It will flag false positives on junior-engineer "problems" when I request a code review (if there's data being mutated in two different places with different conditions, chances are it'll not understand it the first time). 4. It'll get stuck on nonsense (the "thinking" output makes me cringe) and try to go in random directions if I ask it to debug a problem. I don't think I've had it actually find the actual problem once (but it has found a couple other unrelated problems which is nice). 5. In Plan/Build mode, it'll not follow a plan. It also seems to oddly dodge writing secure code for auth-related stuff (if I hadn't have read through OAuth myself, I wouldn't have caught it). > tried to streamline processes and then got yelled at by CS, sales enablement, and marketing. Yes haha! And a few other departments besides. Anyways thanks for your time :) | ||