I feel like I'm taking crazy pills. The article starts with:

> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.

That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.

Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.

▲ b33j0r 7 hours ago | parent | next [-]

I usually do most of the engineering and it works great for writing the code. I’ll say:

> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.

> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.

> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).

▲

logicprog an hour ago | parent | next [-]

Yeah, I feel like I get really good results from AI, and this is very much how I prompt as well. It just takes care of writing the code, making sure to update everything that is touched by that code guided by linters and type-checkers, but it's always executing my architecture and algorithm, and I spend time carefully trying to understand the problem before I even begin.

▲

gedy 7 hours ago | parent | prev | next [-]

What you’re describing makes sense, but that type of prompting is not what people are hyping

▲

Leherenn 6 hours ago | parent | next [-]

I haven't tried it, but someone at work suggested using voice input for this because it's so much easier to add details and constraints. I can certainly believe it, but I hate voice interfaces, especially if I'm in an open space setting.

You don't even have to be as organised as in the example, LLMs are pretty good at making something out of ramblings.

▲

ljm 6 hours ago | parent | prev [-]

The more accurate prompt would be “You are a mind reader. Create me a plan to create a task manager, define the requirements, deploy it, and tell me when it’s done.”

And then you just rm -rf and repeat until something half works.

	▲	varispeed 3 hours ago \| parent [-]
		"Here are login details to my hosting and billing provider. Create me a SaaS app where customers could rent virtual pets. Ensure it's AI and blockchain and looks inviting and employ addictive UX. I've attached company details for T&C and stuff. Ensure I start earning serious money by next week. I'll bump my subscription then if you deliver, and if not I will delete my account. Go!"

▲

varispeed 3 hours ago | parent | prev | next [-]

This is a good start. I write prompts as if I was instructing junior developer to do stuff I need. I make it as detailed and clear as I can.

I actually don't like _writing_ code, but enjoy reading it. So sessions with LLM are very entertaining, especially when I want to push boundaries (I am not liking this, the code seems a little bit bloated. I am sure you could simplify X and Y. Also think of any alternative way that you reckon will be more performant that maybe I don't know about). Etc.

This doesn't save me time, but makes work so much more enjoyable.

	▲	logicprog 43 minutes ago \| parent [-]
		> I actually don't like _writing_ code, but enjoy reading it. I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly. I don't like moving text around; munging control structures from one shape to another; I don't like the busy work of copying and pasting code that isn't worth DRYing, or isn't capable of being DRY'd effectively; I hate going around and fixing all the little compiler and linter errors produced by a refactor manually; and I really hate the process of filling out the skeleton of an type/class/whatever architecture in a new file before getting to the meat. However, reading code is pretty easy for me, and I'm very good at quickly putting algorithms and architectures I have in my head into words — and, to be honest, I often find this clarifies the high level idea more than writing the code for it, because I don't get lost in the forest — and I also really enjoy taking something that isn't quite good enough, that's maybe 80% of the way there, and doing the careful polishing and refactoring necessary to get it to 100%.

▲

apercu 6 hours ago | parent | prev [-]

This is similar to how I prompt, except I start with a text file and design the solution and paste it in to an LLM after I have read it a few times. Otherwise, if I type directly in to the LLM and make a mistake it tends to come back and haunt me later.

▲ threethirtytwo 6 hours ago | parent | prev | next [-]

I think it’s usage patterns. It is you in a sense.

You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.

I think you and other deniers try one prompt and then they see the issues and stop.

Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.

▲

GorbachevyChase 5 hours ago | parent | next [-]

My personal suspicion is that the detractors value process and implementation details much more highly than results. That would not surprise me if you come from a business that is paid for its labor inputs and is focused on keeping a large team billable for as long as possible. But I think hackers and garage coders see the value of “vibing” as they are more likely to be the type of people who just want results and view all effort as margin erosion rather than the goal unto itself.

The only thing I would change about what you said is, I don’t see it as a child that needs tutoring. It feels like I’m outsourcing development to an offshore consultancy where we have no common understanding, except the literal meaning of words. I find that there are very, very many problems that are suited well enough to this arrangement.

▲

CivBase 5 hours ago | parent | prev [-]

> Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want.

Who are you people who spend so much time writing code that this is a significant productivity boost?

I'm imagining doing this with an actual child and how long it would take for me to get a real return on investment at my job. Nevermind that the limited amount of time I get to spend writing code is probably the highlight of my job and I'd be effectively replacing that with more code reviews.

▲

threethirtytwo 5 hours ago | parent | next [-]

it's not just writing code.

And maybe child is too simplistic of an analogy. It's more like working with a savant.

The type of thing you can tell AI to do is like this: You tell it to code a website... it does it, but you don't like the pattern.

Say, "use functional programming", "use camel-case" don't use this pattern, don't use that. And then it does it. You can leave it in the agent file and those instructions become burned into it forever.

▲

dimitri-vs 5 hours ago | parent | prev | next [-]

A better way to put it is with this example: I put my symptoms into ChatGPT and it gives some generic info with a massive "not-medical-advice" boilerplate and refuses to give specific recommendations. My wife (an NP) puts in anonymous medical questions and gets highly specific med terminology heavy guidance.

That's all to say the learning curve with LLMs is how to say things a specific way to reliability get an outcome.

▲

boredtofears 3 hours ago | parent | prev | next [-]

Here's an example:

I recently inherited an over decade old web project full of EOL'd libraries and OS packages that desperately needed to be modernized.

Within 3 hours I had a working test suite with 80% code coverage on core business functionality (~300 tests). Now - maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours, but I know empirically that they cover a majority of the code of the core logic. We can now incrementally upgrade the project and have at least some kind of basic check along the way.

There's no way I could have pieced together as large of a working test suite using tech of that era in even double that time.

▲

draebek 2 hours ago | parent [-]

You know they cause a majority of the code of the core logic to execute, right? Are you sure the tests actually check that those bits of logic are doing the right thing? I've had Claude et al. write me plenty of tests that exercise things and then explicitly swallow errors and pass.

	▲	boredtofears an hour ago \| parent [-]
		Yes, the first hour or so was spent fidgeting with test creation. It started out doing it's usual whacky behavior like checking the existence of a method and calling that a "pass", creating a mock object that mocked the return result of the logic it was supposed to be testing, and (my favorite) copying the logic out of the code and putting it directly into the test. Lots of course correction, but once I had one well written test that I had fully proofed myself I just provided it that test as an example and it did a pretty good job following those patterns for the remainder. I still sniffed out all the output for LLM whackiness though. Using a code coverage tool also helps a lot.

▲

shimman 5 hours ago | parent | prev [-]

These people are just the same charlatans and scammers you saw in the web3 sphere. Invoking Ryan Dahl as some sort of authority figure and not a tragic figure that sold his soul to VC companies is even more pathetic.

▲

threethirtytwo 5 hours ago | parent [-]

Don't appreciate this comment. Calling me a charlatan is rude. He's not authority, but he has more credibility than you and most people on HN.

There is obvious division of ideas here. But calling one side stupid or referring to them as charlatans is outright wrong and biased.

	▲	shimman 4 hours ago \| parent [-]
		No one called YOU a charlatan, get thicker skin because you are going to run into more and more people that absolutely hate these tools. There is a reason why they struggle selling them and executives are force feeding them to their workers. Charlatan is the perfect term for those that stand to make money selling half baked goods and forcing more mass misery upon society.

▲ giancarlostoro 3 hours ago | parent | prev | next [-]

The secret sauce for me is Beads. Once Beads is setup you make the tasks and refine them and by the end each task is a very detailed prompt. I have Claude ask me clarifying questions, do research for best practices etc

Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.

I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.

I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.

▲ jasondigitized 7 hours ago | parent | prev | next [-]

I feel like I am taking crazy pills. I am getting code that works from Opus 4.5. It seems like people are living in two separate worlds.

▲

ruszki 6 hours ago | parent | next [-]

Working code doesn’t mean the same for everyone. My coworker just started vibe coding. Her code works… on happy paths. It absolutely doesn’t work when any kind of error happens. It’s also absolutely impossible to refactor it in any way. She thinks her code works.

The same coworker asked to update a service to Spring Boot 4. She made a blog post about. She used LLM for it. So far every point which I read was a lie, and her workarounds make, for example tests, unnecessarily less readable.

So yeah, “it works”, until it doesn’t, and when it hits you, that you need to work more in sum at the end, because there are more obscure bugs, and fixing those are more difficult because of terrible readability.

▲

WarmWash 6 hours ago | parent | prev | next [-]

I can't help but think of my earliest days of coding, 20ish years ago, when I would post my code online looking for help on a small thing, and being told that my code is garbage and doesn't work at all even if it actually is working.

There are many ways to skin a cat, and in programming the happens-in-a-digital-space aspect removes seemingly all boundaries, leading to fractal ways to "skin a cat".

A lot of programmers have hard heads and know the right way to do something. These are the same guys who criticized every other senior dev as being a bad/weak coder long before LLMs were around.

▲

crystal_revenge 6 hours ago | parent | prev | next [-]

Parent's profile shows that they are an experienced software engineer in multiple areas of software development.

Your own profile says you are a PM whose software skills amount to "Script kiddie at best but love hacking things together."

It seems like the "separate worlds" you are describing is the impression of reviewing the code base from a seasoned engineer vs an amateur. It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.

At least in my experience, learning to quickly read a code base is one of the later skills a software engineer develops. Generally only very experienced engineers can dive into an open source code base to answer questions about how the library works and is used (typically, most engineers need documentation to aid them in this process).

I mean, I've dabbled in home plumbing quite a bit, but if AI instructed me to repair my pipes and I thought it "looked great!" but an experienced plumber's response was "ugh, this doesn't look good to me, lots of issues here" I wouldn't argue there are "two separate worlds".

▲

jasondigitized 24 minutes ago | parent | next [-]

Except I work with extremely competent software engineers on software used in mission critical applications in the Fortune 500. I call myself a script kiddie because I did not study Computer Science. Am I green in the test run? Does it pass load tests? Is it making money? While some of yall are worried about leaky abstractions, we just closed another client. Two worlds for sure where one team is skating to the puck, looking to raise cattle while another wants to continue nurturing an exotic pet.

Plenty of respect to the craft of code but the AI of today is the worst is is ever going to be.

▲

4 hours ago | parent | prev | next [-]

[deleted]

▲

ModernMech 4 hours ago | parent | prev [-]

> It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.

This really is it: AI produces bad to mediocre code. To someone who produces terrible code mediocre is an upgrade, but to someone who produces good to excellent code, mediocre is a downgrade.

	▲	jasondigitized 5 minutes ago \| parent [-]
		Today. It produces mediocre code today. That is really it. What is the quality of that code compared to 1 year ago. What will it be in 1 year? Opus 6.5 is inevitable.

▲

HarHarVeryFunny 5 hours ago | parent | prev | next [-]

That is such a vague claim, that there is no contradiction.

Getting code to do exactly what, based on using and prompting Opus in what way?

Of course it works well for some things.

▲

GoatInGrey 6 hours ago | parent | prev | next [-]

That's a significant rub with LLMs, particularly hosted ones: the variability. Add in quantization, speculative decoding, and dynamic adjustment of temperature, nucleus sampling, attention head count, & skipped layers at runtime, and you can get wildly different behaviors with even the same prompt and context sent to the same model endpoint a couple hours apart.

That's all before you even get to all of the other quirks with LLMs.

▲

zeroCalories 6 hours ago | parent | prev [-]

It depends heavily on the scope and type of problem. If you're putting together a standard isolated TypeScript app from scratch it can do wonders, but many large systems are spread between multiple services, use abstractions unique to the project, and are generally dealing with far stricter requirements. I couldn't depend on Claude to do some of the stuff I'd really want, like refactor the shared code between six massive files without breaking tests. The space I can still have it work productively in is still fairly limited.

▲ jjice 6 hours ago | parent | prev | next [-]

I've found that the thing that made is really click for me was having reusable rules (each agent accepts these differently) that help tell it patterns and structure you want.

I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.

Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.

That has made writing LLM code so much more enjoyable for me.

▲ Balinares 2 hours ago | parent | prev | next [-]

Nah, I'm with you there. I've yet to see even Opus 4.5 produce something close to production-ready -- in fact Opus seems like quite a major defect factory, given its consistent tendency toward hardcoding case by case workarounds for issues caused by its own bad design choices.

I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.

▲ ActorNightly 6 hours ago | parent | prev | next [-]

Its really becoming a good litmus test for how someones coding ability whether they think LLMS can do well on complex tasks.

For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.

▲

threethirtytwo 6 hours ago | parent [-]

It’s not. There are tons of great programmers, that are big names in the industry who now exclusively vibe code. Many of these names are obviously intelligent and great programmers.

This is an extremely false statement.

▲

ActorNightly 32 minutes ago | parent | next [-]

Non sequitor.

You don't have to be bad at coding to use LLMs. The argument was specifically about thinking that LLMS can be great at accomplishing complex tasks (which they are not)

▲

HarHarVeryFunny 5 hours ago | parent | prev [-]

People use "vibe coding" to mean different things - some mean the original Karpathy "look ma, no hands!", feel the vibez, thing, and some just (confusingly) use "vibe coding" to refer to any use of AI to write code, including treating it as a tool to write small well-defined parts that you have specified, as opposed to treating it as a magic genie.

There also seem to be people hearing big names like Karpathy and Linus Torvalds say they are vibe coding on their hobby projects, meaning who knows what, and misunderstanding this as being an endorsement of "magic genie" creation of professional quality software.

Results of course also vary according to how well what you are asking the AI to do matches what it was trained on. Despite sometimes feeling like it, it is not a magic genie - it is a predictor that is essentially trying to best match your input prompt (maybe a program specification) to pieces of what it was trained on. If there is no good match, then it'll have a go anyway, and this is where things tend to fall apart.

▲

dudeinhawaii 4 hours ago | parent | next [-]

Funny, the last interview I watched with Karpathy he highlighted the way the AI/LLM was unable to think in a way that aligned with his codebase. He described vibe-coding a transition from Python to Rust but specifically called out that he hand-coded all of the python code due to weaknesses in LLM's ability to handle performant code. I'm pretty sure this was the last Dwarkesh interview with "LLMs as ghosts".

	▲	HarHarVeryFunny 4 hours ago \| parent [-]
		Right, and he also very recently said that he felt essentially left behind by AI coding advances, thinking that his productivity could be 10x if he knew how to use it better. It seems clear that Karpathy himself is well aware of the difference between "vibe coding" as he defined it (which he explicitly said was for playing with on hobby projects), and more controlled productive use of AI for coding, which has either eluded him, or maybe his expectations are too high and (although it would be surprising) he has not realized the difference between the types of application where people are finding it useful, and use cases like his own that do not play to its strength.

▲

threethirtytwo 5 hours ago | parent | prev [-]

karpathy is biased. I wouldn't use his name as he's behind the whole vibe coding movement.

You have to pick people with nothing to gain. https://x.com/rough__sea/status/2013280952370573666

	▲	HarHarVeryFunny 4 hours ago \| parent [-]
		I don't think he meant to start a movement - it was more of a throw-away tweet that people took way too seriously, although maybe with his bully pulpit he should have realized that would happen.

▲ dev_l1x_be 7 hours ago | parent | prev | next [-]

Well one way of solving this is to keep giving it simple tasks.

▲

GoatInGrey 6 hours ago | parent | next [-]

The other side of this coin are the non-developer stakeholders who Dunning-Kruger themselves into firm conclusions on technical subjects with LLMs. "Well I can code this up in an hour, two max. Why is it taking you ten hours?". I've (anecdotally) even had project sponsors approach me with an LLM's judgement on their working relationship with me as if it were gospel like "It said that we aren't on the same page. We need to get aligned." It gets weird.

These cases are common enough to where it's more systemic than isolated.

▲

hmaxwell 7 hours ago | parent | prev [-]

Exactly 100%

I read these comments and articles and feel like I am completely disconnected from most people here. Why not use GenAI the way it actually works best: like autocomplete on steroids. You stay the architect, and you have it write code function by function. Don't show up in Claude Code or Codex asking it to "please write me GTA 6 with no mistakes or you go to jail, please."

It feels like a lot of people are using GenAI wrong.

	▲	latexr 6 hours ago \| parent [-]
		> It feels like a lot of people are using GenAI wrong. That argument doesn’t fly when the sellers of the technology literally sing at you “there’s no wrong way to prompt”. https://youtu.be/9bBfYX8X5aU?t=48

▲ nozzlegear 6 hours ago | parent | prev | next [-]

You're not taking crazy pills, this is my exact experience too. I've been using my wife's eCommerce shop (a headless Medusa instance, which has pretty good docs and even their own documentation LLM) as a 100% vibe-coded project using Claude Code, and it has been one comedy of errors after another. I can't tell you how many times I've had it go through the loop of Cart + Payment Collection link is broken -> Redeploy -> Webhook is broken (can't find payment collection) -> Redeploy -> Cart + Payment Collection link is broken -> Repeat. And it never seems to remember the reasons it had done something previously – despite it being plastered 8000 times across the CLAUDE.md file – so it bumbles into the same fuckups over and over again.

A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.

▲ SCdF 7 hours ago | parent | prev | next [-]

I am getting workable code with Claude on a 10kloc Typescript project. I ask it to make plans then execute them step by step. I have yet to try something larger, or something more obscure.

▲ brabel 6 hours ago | parent | next [-]

Most agents do that by default now.

	▲	GoatInGrey 6 hours ago \| parent [-]
		I feel like there is a nuance here. I use GitHub Copilot and Claude Code, and unless I tell it to not do anything, or explicitly enable a plan mode, the LLM will usually jump straight to file edits. This happens even if I prompt it with something as simple as "Remind me how loop variable scoping works in this language?".

▲ jasondigitized 7 hours ago | parent | prev [-]

This. I feel like folks are living in two separate worlds. You need to narrow the aperture and take the LLm through discrete steps. Are people just saying it doesn't work because they are pointing it at 1m loc monoliths and trying to oneshot a giant epic?

	▲	nh23423fefe 6 hours ago \| parent \| next [-]
		AI was useless for me on a refactor of a repo 20k loc even after I gave examples of the migrations I wanted in commits. It would correctly modify a single method. I would ask it to repeat for next and it would fail. The code that our contractors are submitting is trash and very high loc. When you inspect it you can see that unit tests are testing nothing of value. `when(mock.method(foo)).thenReturn(bar) assert(bar == bar)` stuff like that its all fake coverage, for fake tests, for fake OKRs what are people actually getting done? I've sat next to our top evangelist for 30 minutes pair programming and he just fought the tool saying something was wrong with the db while showing off some UI I dont care about. like that seems to be the real issue to me. i never bother wasting time with UI and just write a tool to get something done. but people seem impressed that AI did some shitty data binding to a data model that cant do anything, but its pretty. it feels weird being an avowed singularitarian but adamant that these tools suck now.
	▲	echelon 6 hours ago \| parent \| prev [-]
		I'm using Claude in a giant Rust monorepo. It's really good at implementing HTTP handlers and threaded workers when I point it at prior examples.

▲ __grob 5 hours ago | parent | prev | next [-]

It still amazes me that so many people can see LLMs writing code as anything less than a miracle in computing...

	▲	Balinares 2 hours ago \| parent [-]
		I mean, a trained dog who plays the piano is a miracle in canine education, until such a point where you assess the quality of its performance.

▲ echohack5 7 hours ago | parent | prev | next [-]

I have found AI great in alot of scenarios but If I have a specific workflow, then the answer is specific and the ai will get it wrong 100% of the time. You have a great point here.

A trivial example is your happy path git workflow. I want:

- pull main

- make new branch in user/feature format

- Commit, always sign with my ssh key

- push

- open pr

but it always will

- not sign commits

- not pull main

- not know to rebase if changes are in flight

- make a million unnecessary commits

- not squash when making a million unnecessary commits

- have no guardrails when pushing to main (oops!)

- add too many comments

- commit message too long

- spam the pr comment with hallucinated test plans

- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )

- not make DCO compliant commits ...

Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?

All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.

	▲	ikrenji 6 hours ago \| parent \| next [-]
		Is commit history that useful? I never wanted to look up anything in it that couldn't be solved with git log \| grep xyz...
	▲	furyofantares 7 hours ago \| parent \| prev [-]
		> why am I relying on that if the AI is so smart? Because it's not? I use these things very extensively to great effect, and the idea that you'd think of it as "smart" is alien to me, and seems like it would hurt your ability to get much out of them. Like, they're superhuman at breadth and speed and some other properties, but they don't make good decisions.

▲ causalscience 3 hours ago | parent | prev | next [-]

You're not crazy, I'm also always disappointed.

My theory is that the people who are impressed are trying to build CRUD apps or something like that.

	▲	anthonypasq96 3 hours ago \| parent [-]
		so 99% of all software?

▲ GolDDranks 7 hours ago | parent | prev | next [-]

Just a supplementary fact: I'm in the beneficial position, against the AI, that in a case where it's hard to provide that automatic feedback loop, I can run and test the code at my discretion, whereas the AI model can't.

Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.

▲ elevation 7 hours ago | parent | next [-]

Yesterday I generated a for-home-use-only PHP app over the weekend with a popular cli LLM product. The app met all my requirements, but the generated code was mixed. It correctly used a prepared query to avoid SQL injection. But then, instead of an obvious:

    "SELECT * FROM table WHERE id=1;"

it gave me:

    $result = $db->query("SELECT * FROM table;");
    for ($row in $result)
        if ($["id"] == 1)
            return $row;

With additional prompting I arrived at code I was comfortable deploying, but this kind of flaw cuts into the total time-savings.

▲ ReverseCold 7 hours ago | parent | prev | next [-]

> I can run and test the code at my discretion, whereas the AI model can't.

It sounds like you know what the problem with your AI workflow is? Have you tried using an agent? (sorry somewhat snarky but… come on)

▲

GolDDranks 7 hours ago | parent [-]

Yeah, you're right, and the snark might be warranted. I should consider it the same as my stupid (but cute) robot vacuum cleaner that goes at random directions but gets the job done.

The thing that differentiates LLM's from my stupid but cute vacuum cleaner, is that the (at least OpenAI's) AI model is cocksure and wrong, which is infinitely more infuriating than being a bit clueless and wrong.

▲

storystarling 7 hours ago | parent | next [-]

I've been trying to solve this by wrapping the generation in a LangGraph loop. The hope was that an agent could catch the errors, but it seems to just compound the problem. You end up paying for ten API calls where the model confidently doubles down on the mistake, which gets expensive very quickly for no real gain.

▲

yaur 7 hours ago | parent | prev [-]

Give Cluade Code a go. It still makes a lot stupid mistakes, but its a vastly different experience from pasting back and forth with chat gpt.

▲

tayo42 6 hours ago | parent [-]

There's no free trial or anything?

▲

yaur 5 hours ago | parent [-]

You can play with the model for free in chat... but if $20 for a coding agent isn't effectively free for use case it might not be the right tool for you.

ETA: I've probably gotten 10k worth of junior dev time out of it this month.

	▲	tayo42 5 hours ago \| parent [-]
		The chat is limited and doesn't let you use the latest model. if that's representative of the answers I would get by paying, it doesn't seem worth it. Im not crazy about signing up for a subscription service, it depends on you remembering to cancel and not have a headache when you do cancel.

▲ __MatrixMan__ 7 hours ago | parent | prev [-]

You might get better code out of it if you give the AI some more restrictive handcuffs. Spin up a tester instance and have it tell the developer instance to try again until it's happy with the quality.

▲ t55 7 hours ago | parent | prev [-]

[flagged]

▲

GolDDranks 7 hours ago | parent [-]

I don't love these kinds of throwaway comments without any substance, but...

"It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It"

...might be my issue indeed. Trying to balance it by not being too stubborn though. I'm not doing AI just to be able to dump on them, you know.

▲

ahelwer 6 hours ago | parent | next [-]

An alternative reading of these comments is "I went to the casino and had a great time! Don't understand how you could have lost money."

▲

antonvs 7 hours ago | parent | prev [-]

Skill comes from experience. It takes a good amount of working with these models to learn how to use them effectively, when to use them, and what to use them for. Otherwise, you end up hitting their limitations over and over and they just seem useless.

They're certainly not perfect, but many of the issues that people post about as though they're show-stoppers are easily resolved with the right tools and prompting.

▲

BAM-DevCrew 6 hours ago | parent [-]

20% tools, 40% prompt, 40% claude.md (agents.md) = 98% success most of the time. A few errors to correct is not the end of the world.

	▲	antonvs 6 hours ago \| parent [-]
		Right. But "prompt" also covers a lot of ground, e.g. planning, tracking tasks, etc. The codex-style frameworks do a good amount of that for you, but it can still make a big difference to structure what you're asking the model to do and let it execute step by step. A lot of the failures people talk about seem to involve expecting the models to one-shot fairly complex requirements.