Remix.run Logo
latentsea 2 days ago

Things like that are only part of it. You can also also up your agents batting average by finding ways to build guardrails and things that inject the right context at the right time.

Like for instance we have a task runner in our project that provides a central point to do all manner of things like linting, building, testing, local deployment etc. The build, lint and test tasks are shared between local development and CI. The test tasks run the tests, take the TRX files and use a library to parse it to produce a report. So the agent can easily get access to the same info as CI is putting out about test failures. The various different test suites output reports under a consistent folder structure, they also write logs to disk under a consistent folder structure too. On failure the test tasks output a message to look at the detailed test reports and cross-reference that with the logs to debug the issue. Where possible the test reports contain correlation IDs inlined into the report.

With the above system when the agent is working through implementing something and the tests don't pass, it naturally winds up inspecting the test reports, cross referencing that with the logs, and solving the problems at a higher rate than compared to just taking a wild guess at how to run the tests and then do something random.

Getting it to write it's own guardrails by creating Roslyn Analyzers to make the build fail when it deviates from the project architecture and conventions has been another big win.

Tonnes of small things like that start to add up.

Next on my list is getting a debug MCP server, so it can set breakpoints and step through code etc.