Ask HN: What am I doing wrong Re Agentic coding

15 points by tlonny 13 hours ago | 14 comments

Here is the prompt I gave both Claude Code CLI, and the VSCode agent for my TS project:

```

I have modified the type signature and behaviour of how jobs are created. Previously, job definition create took a batch argument (created from a queue). Now it takes the queue directly, is async, requires the databaseClient to be passed in at creation (vs. when the batch is executed). It no longer returns anything - which is fine because the result was only being used for logging - which is now done for us so we don't have to worry. Can we refactor the codebase to make use of the new JobDefinition.create? Remove the vestigial "Job created" log please.

Perform this task and this task only. If you see something unrelated that you believe needs to be refactored - DO NOT MODIFY IT. ONLY PERFORM ACTIONS DIRECTLY RELEVANT TO THIS TASK

```

So there are two instructions:

1. Do the task

2. Don't do stuff that isn't the task (added in frustration on subsequent attempts)

My experience:

The agent flow started well - it found all the files that needed to change and began making edits.

By about file #5 I noticed that on top of requested refactor it started re-ordering object keys of the `JobDefinition.create` method. Although semantically a no-op, this was incredibly frustrating as it made diffs much harder to review.

A little later, it started to modify log messages it wasn't happy with before eventually completely going off the rails and adding arguments to my function definitions that it _thought_ they needed (introducing type/run-time errors).

VSCode would periodically pause and ask for a confirmation in order to continue. Each time I used the opportunity to re-prompt the agent to stay on target:

Me: "STOP GOING OFF TASK - STOP RENAMING VARIABLES, REORDERING PARAMS. JUST DO AS THE TASK TELLS YOU AND NOTHING ELSE"

Agent: "You're absolutely right. I apologize for going off task. Let me focus solely on the task: refactoring JobDefinition.create calls to use the new signature and removing vestigial "Job created" logs"

And each time the bad behavior would return after some time.

I'm not sure what I'm doing wrong. I assumed this sort of mechanical monkey work would be bread and butter for an agentic workflow - but it just keeps losing coherence.

I ended up reverting all the changes as I had absolutely 0 trust in the quality of the generated code.

I apologise for the wall of text but I'm quite frustrated about all the time wasted and am desperate to know what I'm doing wrong!

Thanks in advance!

▲ spott 12 hours ago | parent | next [-]

The mode is getting lost because the task you gave it, is way up the context chain from what it is currently working on. It loses track of its task and starts working on other things that it notices.

The way to get around this is to never have the model just “do the thing”. Have it create a plan and create a todo list from the plan (it will do this on its own typically in Claude Code), the. You “approve” the plan, then start working against that todo list and plan.

This ensures that the “task” is never very large (it is always just the next thing on the todo list, which has already been scoped to be small) and there is never any ambiguity over what to do next.

So for your prompt I would ask it to find all locations that use the old job api and put them in a planning document. For each location, have it note if it anticipates any difficulty transitioning to the new api in the planning document. If you want to get fancy, have it use the Task tool to have a subset do the analysis, this keeps the context of the main model less cluttered. I usually use planning mode for this in Claude Code. Then look at the plan, approve it (or tweak it) and have it execute that plan.

	▲	asdfbank an hour ago \| parent [-]
		I agree with this, break it down further. When instructing it to execute the plan, i would even have it only execute step 1, then you review. then step 2, review and so on. this prevents it from going down a rabbit hole and you can adjust the plan's steps or change course more easily.

▲ yurifury 4 hours ago | parent | prev | next [-]

While you aren't getting consistent, usable results from your setup you shouldn't allow it to many any changes that you disagree with. For every change it proposes, review thoroughly and tell it what you'd prefer instead. It will get better during the session, and you can ask it to codify the rules you've communicated implicitly in the CLAUDE.md file.

▲ jdsully 11 hours ago | parent | prev | next [-]

A few tips:

- Make tasks short and break them into smaller steps. E.g. don't say "Add a UI button, and a handler that does the thing". But first add the button, confirm its showing as expected then move on to a handler, and so on.

- Do give warnings like "Don't modify unrelated code" but again what the model considers related and what you consider may not be the same (this is much easier if the tasks are small so see point 1).

- If the model keeps making similar mistakes or repeating the same broken thing its because it doesn't know how to solve the problem. These models don't have a way to tell you "I don't know" so they will just keep producing busted code. Give the model additional information to help it like you would a coworker who can't seem to make progress.

▲ Mave83 13 hours ago | parent | prev | next [-]

Maybe try forcing it to properly plan ahead, break it down into small steps, and ask you to approve the plan.

Of course add a CLAUDE.md, put clear development guidelines into it, let it verify the git changes he did against this guidelines and of course things like a lint.

It will go off rails, especially after compaction, but you can make it correct mistakes on it's own.

▲ cluckindan 11 hours ago | parent | prev | next [-]

Don’t tell it what not to do. Roughly, it doesn’t have the concept of ”not foobar”: mentioning such a negation in a prompt doesn’t do what a human would expect, and will instead cause ”foobar” activation and possibly also everything that is ”not” + ”foobar”, leading to inattention/off-task behavior as seen here.

You want your prompt to resonate with the desired output in a perfect harmony, and a ”don’t do X” is a bum note.

▲ grim_io 12 hours ago | parent | prev | next [-]

Telling the agent to very much not do something is a lost battle. It will make everything worse, not just the stuff it messed up already.

If you expect a genuine understanding of your instructions, you will be very disappointed, no matter what you do.

The way to success is not caring about those small issues and fixing them up in the review.

If you get 95% there, then i'd say you did as well as you can hope for.

▲ 8note 7 hours ago | parent | prev | next [-]

the task is pretty unclear. youve got one real ask, for it to remove a log statement, but you've buried it in irrelevant context.

its better if you can let it read the before code, then read the after code, then you give it the ask - remove the log, switch callers to the new interface.

then, have it write up a work/prompt ppan, and keep progress in an markdown file

▲ yelirekim 13 hours ago | parent | prev | next [-]

You're asking Claude to refactor multiple different job types all at once, which creates too much complexity in a single pass. The prompt itself is also somewhat unclear about the specific transformations needed.

Try this:

1. Break it down by job type. Instead of "refactor the codebase to make use of the new JobDefinition.create", identify each distinct job type and refactor them one at a time. This keeps the context focused and prevents the agent from getting overwhelmed.

2. For many jobs, script it. If you have dozens/hundreds of jobs to refactor, write a shell script that:

  for job_type in "EmailJob" "DataProcessingJob" "ReportJob"; do
    claude --dangerously-skip-permissions -p "Refactor only ${job_type} to use the new JobDefinition.create signature: make it async, pass databaseClient at creation, remove return value and 'Job created' logs. Change ONLY ${job_type} files."
    git add -A && git commit -m "Refactor ${job_type} to new signature"
  done

This creates atomic commits you can review/revert individually.

3. Consider a migration shim. Have Claude create a compatibility layer so jobs can work with either the old or new signature during the refactor. This lets you test incrementally without breaking everything at once.

4. Your prompt needs clarity. Here's a clearer version:

  Refactor ONLY [SpecificJobName] class to match the new JobDefinition.create signature:
  - OLD: create(batch) returns result, synchronous
  - NEW: create(queue, databaseClient) returns void, async
  - Remove any "Job created" console.log statements
  - Do NOT modify unrelated code, reorder parameters, or rename variables

The issue with your original prompt is it doesn't clearly specify the before/after states or which specific files to target. Claude Code works best with precise, mechanical instructions rather than contextual descriptions like "Previously... Now it takes..."

Pro tip: Use Claude itself to improve your prompts! Try:

  claude -p "Help me write a clearer prompt for this refactoring task: [paste your original prompt]"

and save the result to a markdown file for reuse.

The key insight is that agentic tools excel at focused, well-defined transformations but struggle when the scope is too broad or the instructions are ambiguous. "Don't do anything else" is not an instruction that Claude does a good job of interpreting. The "going off the rails" behavior you're seeing is Claude trying to be helpful by "improving" code it encounters, which is why explicit constraints ("ONLY do X") are crucial rather than specifying a broad directive concerning what it shouldn't do.

▲ drekipus 10 hours ago | parent | prev | next [-]

This is agentic programming, if you're not having fun I suggest you jump ship.

▲ crashprone 9 hours ago | parent | prev | next [-]

A while ago, a HN comment linked to this blog by Mario Zechner https://mariozechner.at/posts/2025-06-02-prompts-are-code/ which was exactly what I needed back then. The workflow helped me analyse about 300 files. It definitely made up stuff along the way and missed valuable context that I would have seen had I done the analysis manually, but it's a good starting point.

Without a structured workflow, as you said, after 5 files, it's starts going berserk.

That said, Claude has been really bad the last few weeks, especially in VSCode Copilot. If you ask it repeatedly, it admits that's its Sonnet 3.5 and not Sonnet 4. Not sure if it's true, but Sonnets workflow has degraded the last few weeks.

You would get better results probably by using search and replace or a python script to make those changes.

The comments that insist you'd get better results by talking to it like you would to a human, I hope they're trolling. The idea that the tone of your prompt has any influence at all, is laughable.

▲ hellsten 12 hours ago | parent | prev | next [-]

$ claude

> Plan how to refactor the codebase to use the new JobDefinition.create function introduced in git commit <git commit hash>. Split task into subtasks, if needed. Write the plan to todo.md.

...

> Start working on the task in @todo.md. Write code that follows the "Keep it simple, stupid!" principle.

▲ cyanydeez 8 hours ago | parent | prev | next [-]

It smells like AI in here.

▲ perfmode 12 hours ago | parent | prev [-]

be encouraging. say please. don’t speak roughly. models perform better when treated with respect. speak to it as you would speak to someone you respect.

have it create a plan. them verify its plan. then proceed to execute.