I noticed this unusual line in go.mod and got curious why it is using replace for this (typically you would `go get github.com/Masterminds/semver/v3@v3.4.0` instead).

  replace github.com/Masterminds/semver/v3 => github.com/Masterminds/semver/v3 v3.4.0

I found this very questionable PR[0]. It appears to have been triggered by dependabot creating an issue for a version upgrade -- which is probably unnecessary to begin with. The copilot agent then implemented that by adding a replace statement, which is not how you are supposed to do this. It also included some seemingly-unrelated changes. The copilot reviewer called out the unrelated changes, but the human maintainer apparently didn't notice and merged anyway.

There is just so much going wrong here.

[0] https://github.com/github/gh-aw/pull/4469

▲ spankalee 5 hours ago | parent | next [-]

This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install. Usually it's a kind of ok version, but it's not how I would like this to work.

It's worse with renaming things in code. I've yet to see an agent be able to use refactoring tools (if they even exist in VS Code) instead of brute-forcing renames with string replacement or sed. Agents use edit -> build -> read errors -> repeat, instead of using a reliable tool, and it burns a lot more GPU...

▲

embedding-shape 4 hours ago | parent | next [-]

> This happens with all agents I've used and package.json files for npm. Instead of using `npm i foo` the agent string-edits package.json and hallucinates some version to install.

When using codex, I usually have something like `Never add 3rd party libraries unless explicitly requested. When adding new libraries, use `cargo add $crate` without specifying the version, so we get the latest version.` and it seems to make this issue not appear at all.

▲

teaearlgraycold 2 hours ago | parent [-]

Eventually this specific issue will be RLHF’d out of existence. For now that should mostly solve the problem, but these models aren’t perfect at following instructions. Especially when you’re deep into the context window.

▲

girvo 2 hours ago | parent [-]

> Especially when you’re deep into the context window.

Though that is, at least to me, a bit of an anti-pattern for exactly that reason. I've found it far more successful to blow away the context and restart with a new prompt from the old context instead of having a very long running back-and-forward.

Its better than it was with the latest models, I can have them stick around longer, but it's still a useful pattern to use even with 4.6/5.3

	▲	teaearlgraycold 2 hours ago \| parent [-]
		Opus has also clearly been trained to clear the context fairly often through the plan/code/plan cycle.

▲

threecheese 2 hours ago | parent | prev | next [-]

For the first, I think maintaining package-add instructions is table stakes, we need to be opinionated here. Agents are typically good at following them, if not you can fall over to a Makefile that does everything.

For the second, I totally agree. I continue to hope that agents will get better at refactoring, and I think using LSPs effectively would make this happen. Claude took dozens of minutes to perform a rename which Jetbrains would have executed perfectly in like five seconds. Its approach was to make a change, run the tests, do it again. Nuts.

▲

root_axis 2 hours ago | parent | prev | next [-]

> brute-forcing renames with string replacement

That's their strategy for everything the training data can't solve. This is the main reason the autonomous agent swarm approach doesn't work for me. 20 bucks in tokens just obliterated with 5 agents exchanging hallucinations with each-other. It's way too easy for them to amplify each other's mistakes without a human to intervene.

▲

richardw 4 hours ago | parent | prev | next [-]

Totally. Surely the IDE’s like antigravity are meant to give the LLM more tools to use for eg refactoring or dependency management? I haven’t used it but seems a quick win to move from token generation to deterministic tool use.

	▲	port11 4 hours ago \| parent [-]
		As if. I’ve had Gemini stuck on AG because it couldn’t figure out how to use only one version of React. I managed to detect that the build failed because 2 versions of React were being used, but it kept saying “I’ll remove React version N”, and then proceeding to add a new dependency of the latest version. Loops and loops of this. On a similar note AG really wants to parse code with weird grep commands that don’t make any sense given the directory context.

▲

kittbuilds an hour ago | parent | prev [-]

[dead]

▲ bakibab 4 hours ago | parent | prev | next [-]

They are trying to fix it using this comment but cancelled mid way. Not sure why.

https://github.com/github/gh-aw/pull/14548

▲

onionisafruit 4 hours ago | parent [-]

Ha, they used my comment in the prompt. I love it.

	▲	resquawk 2 hours ago \| parent [-]
		Thanks! We fixed this in another PR. Appreciate the feedback

▲ Lucasoato 3 hours ago | parent | prev | next [-]

It is so important to use specific prompts for package upgrading.

Think about what a developer would do: - check the latest version online; - look at the changelog; - evaluate if it’s worth to upgrade or an intermediate may be alright in case of code update are necessary;

Of course, the keep these operations among the human ones, but if you really want to automate this part (and you are ready to pay its consequences) you need to mimic the same workflow. I use Gemini and codex to look for package version information online, it checks the change logs from the version I am to the one I’d like to upgrade, I spawn a Claude Opus subagent to check if in the code something needs to be upgraded. In case of major releases, I git clone the two packages and another subagents check if the interfaces I use changed. Finally, I run all my tests and verify everything’s alright.

Yes, it might not still be perfect, but neither am I.

▲ awesome_dude 2 hours ago | parent | prev [-]

This is more evidence of my core complaint with AI (and why it's not AGI at this point)

The AI hasn't understood what's going on, instead it has pattern matched strings and used those patterns to create new strings that /look/ right, but fail upon inspection.

(The human involved is also failing my Turing test... )