Remix.run Logo
survirtual 8 hours ago

This is the key, with test driven dev sprinkled in.

You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.

I model this as "grounding". Just like you need to ground an electrical system, you need to ground the LLM to reality. The tests do this, so they are REQUIRED for all LLM coding.

Once a framework is established, you require tests for everything. No code is written without tests. These can also be perf tests. They need solid metrics in order to output quality.

The tests provide context and documentation for future LLM runs.

This is also the same way I'd handle foreign teams, that at no fault of their own, would often output subpar code. It was mainly because of a lack of cultural context, communication misunderstandings, and no solid metrics to measure against.

Our main job with LLMs now as software engineers is a strange sort of manager, with a mix of solutions architect, QA director, and patterns expertise. It is actually a lot of work and requires a lot of human people to manage, but the results are real.

I have been experimenting with how meta I can get with this, and the results have been exciting. At one point, I had well over 10 agents working on the same project in parallel, following several design patterns, and they worked so fast I could no longer follow the code. But with layers of tests, layers of agents auditing each other, and isolated domains with well defined interfaces (just as I would expect in a large scale project with multiple human teams), the results speak for themselves.

I write all this to encourage people to take a different approach. Treat the LLMs like they are junior devs or a foreign team speaking a different language. Remember all the design patterns used to get effective use out of people regardless of these barriers. Use them with the LLMs. It works.

layer8 4 minutes ago | parent | next [-]

> Once specs are captured as tests, the LLM can no longer hallucinate.

Tests are not a correctness proof. I can’t trust LLMs to correctly reason about their code, and tests are merely a sanity check, they can’t verify that the code was correctly reasoned.

techpression 6 hours ago | parent | prev | next [-]

> You provide basic specs and can work with LLMs to create thorough test suites that cover the specs. Once specs are captured as tests, the LLM can no longer hallucinate.

Except when it decides to remove all the tests, change their meaning to make them pass or write something not in the spec. Hallucinations are not a problem of the input given, it’s in the foundations of LLMs and so far nobody have solved it. Thinking it won’t happen can and will have really bad outcomes.

CuriouslyC 20 minutes ago | parent | next [-]

You can solve this easily by having a separate agent write the tests, and not giving the implementing agent write permission on test files.

survirtual 5 hours ago | parent | prev [-]

It doesn't matter because use of version control is mandatory. When you see things missing or bypassed, audit-instructed LLMs detect these issues and roll-back changes.

I like to keep domains with their own isolated workspaces and git repos. I am not there yet, but I plan on making a sort of local-first gitflow where agents have to pull the codebase, make a new branch, make changes, and submit pull requests to the main codebase.

I would ultimately like to make this a oneliner for agents, where new agents are sandboxed with specific tools and permissions cloning the main codebase.

Fresh-context agents then can function as code reviewers, with escalation to higher tier agents (higher tier = higher token count = more expensive to run) as needed.

In my experience, with correct prompting, LLMs will self-correct when exposed to auditors.

If mistakes do make it through, it is all version controlled, so rolling back isn't hard.

CuriouslyC 18 minutes ago | parent [-]

This is the right flow. As agents get better, work will move from devs orchestrating in ides/tuis to reactive, event driven orchestration surfaced in VCS with developers on the loop. It cuts out the middleman and lets teams collaboratively orchestrate and steer.

skydhash 5 hours ago | parent | prev [-]

But do you understand the problem and its context well enough to write tests for the solution?

Take prolog and logic programming. It's all about describing the problem and its context and let the solver find the solution. Try writing your specs in pseudo-prolog code and you will be surprised with all the missing information you're leaving up to chance.