Remix.run Logo
techpression 6 hours ago

> You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.

Except when it decides to remove all the tests, change their meaning to make them pass or write something not in the spec. Hallucinations are not a problem of the input given, it’s in the foundations of LLMs and so far nobody have solved it. Thinking it won’t happen can and will have really bad outcomes.

CuriouslyC 34 minutes ago | parent | next [-]

You can solve this easily by having a separate agent write the tests, and not giving the implementing agent write permission on test files.

survirtual 6 hours ago | parent | prev [-]

It doesn't matter because use of version control is mandatory. When you see things missing or bypassed, audit-instructed LLMs detect these issues and roll-back changes.

I like to keep domains with their own isolated workspaces and git repos. I am not there yet, but I plan on making a sort of local-first gitflow where agents have to pull the codebase, make a new branch, make changes, and submit pull requests to the main codebase.

I would ultimately like to make this a oneliner for agents, where new agents are sandboxed with specific tools and permissions cloning the main codebase.

Fresh-context agents then can function as code reviewers, with escalation to higher tier agents (higher tier = higher token count = more expensive to run) as needed.

In my experience, with correct prompting, LLMs will self-correct when exposed to auditors.

If mistakes do make it through, it is all version controlled, so rolling back isn't hard.

CuriouslyC 32 minutes ago | parent [-]

This is the right flow. As agents get better, work will move from devs orchestrating in ides/tuis to reactive, event driven orchestration surfaced in VCS with developers on the loop. It cuts out the middleman and lets teams collaboratively orchestrate and steer.