Remix.run Logo
shepherdjerred 5 days ago

You can prevent quite a lot of these issues if you write rules for Cursor or your preferred IDE

Linters can also help quite a bit. In the end, you either have your rules enforced programmatically or by a human in review.

I think it’s a very different (and so far, for me, uncomfortable) way of working, but I think there can be benefits especially as tooling improves

sshine 5 days ago | parent | next [-]

It seems like people who use AI for coding need to reinvent a lot of the same basic principles of software engineering before they gradually propagate into the mainstream agentic frameworks.

Coding agents come with a lot of good behavior built in.

Like "planning mode" where they create a strong picture of what's to be made before touching files. This has honestly improved my workflow at programming from wanting to jump into prototyping before I even have a clear idea, to being very spec-oriented: Of course there needs to be a plan, especially when it will be drafted for me in seconds.

But the amount of preventable dumb things coding agents will do that need to be explicitly stated and meticulously repeated in their contexts reveals how simply training on the world's knowledge does not capture senior software engineer workflows entirely, and captures a lot of human averageness that is frowned upon.

shepherdjerred 4 days ago | parent [-]

100%

A lot has to be re-done. Using an IDE like Cursor is really a skill in its own & you likely won't see a productivity boost using agents without moderate investment, and even then there are tradeoffs.

I think the real benefit comes in a few years when more engineering has been done and the tools are more polished. The way that I look at is that these tools are the worst they'll ever be.

cardanome 5 days ago | parent | prev [-]

Do those rules really work? I have added the rule to not not add comments and I still have to constantly remind the model to not add comments despite of it.

ewoodrich 5 days ago | parent | next [-]

I have a .roorules file with only about four instructions, one of which is an (unintentional) binary canary of very simple rule following at the end of a task. And another rule that’s a fuzzier canary as it is not always applicable but usually occurs a few times in a task so helps me confirm the rules are being parsed at all in case Roo has a bug.

All the models I’ve used (yes, including all the biggest, newest, smartest ones) follow the binary rule about 75% of the time at the very most. Usually closer to 50% on average, with odds significantly decreasing the longer the context increases as it occurs at the end of a task but other than that seems to have no predictable pattern.

The fuzzier rule is slightly better, I’m guessing because it applies earlier in the context window, at around 80% compliance and uses lots of caps and emphasis. This one has a more predictable failure mode of the ratio of reading code vs thinking/troubleshooting/time the model is “in its own head”. When mostly reading code or my instructions compliance is very high, when doing extended troubleshooting or anything that starts to veer away from the project itself into training data it is much lower.

So it’s hit and miss and does help but definitely not something I’d rely on as a hard guardrail, like not executing commands, which Roo has a non-LLM tool config to control. So over time I hope agentic runners add more detetministic config outside the model itself, because instructions still aren't as reliable as they should be and don't seem to be getting substantially better in real use.

shepherdjerred 4 days ago | parent | prev [-]

I've found that it mostly works, though it can still makes mistakes.

e.g. I had a lint rule enabled that the AI would always violate & have to iterate to fix, I added a lint rule to tell it not do to a certain thing, and most of the time it wrote code that passed the linter on the first try.