Remix.run Logo
cityofdelusion 26 minutes ago

I’m eager to test this out. I have agent instructions to try to limit the worst of this already, but patterns still sneak through. I have a review agent run after every single edit looking for all of the following if you need more ideas for checks:

- DRY principle violations, multiple definitions of the same helpers or utilities.

- Changes that deviate from existing patterns and architecture already in the code, especially in nearby and related code

- Comments that add no context or simply restate the field name.

- Naming violations (enterprise factoryfactoryabstraction stuff, excessively long names, overly technical names, banned words like “seam”, “durable”, and no-value-qualifiers like “SaveGame” -> “Save”).

- Tests that check implementations instead of correct business behavior.

- Overly backwards-compatible unless asked for (this one is incredibly hard to keep under control, as AI loves to guard everything even if the previous code was never deployed and thus there is no contract break)

- Un-necessary guard code (this is hard to control, most common case is the AI not relying on the serializer error handler and instead adding guards that the library already handles)

- Changing public API contracts without express permission to do so (depends on the code, eg a library JAR or versioned REST service)

- Meta references to previous code versions, to tasks or todos, or to instructions and other non-code context (e.g you tell the AI the adder should ignore negative numbers and that meta fact enters the comments or code)

I usually hand review all changes myself but it’s incredibly tedious so I try to first pass with the review agent until it comes back clean. I hate wasting tokens on it though.

smj-edison 21 minutes ago | parent [-]

Oh my gosh, the guard code drives me crazy. In try so hard to get Kimi to put in asserts instead of silently swallowing corrupt values, but it keeps handling bad values poorly instead of crashing. I've even explicitly put in CLAUDE.md that correctness is more important then continuing to run, but it still keeps defensively programming when it should loudly crash.