Remix.run Logo
roberttod 4 hours ago

It's unintuitive, but having an llm verification loop like a code reviewer works impeccably well, you can even create dedicated agents to check for specific problem areas like poor error handling.

This isn't about anthropomorphism, it's context engineering. By breaking things into more agents, you get more focused context windows.

I believe gas town has some review process built in, but my comment is more to address the idea that it's all slop.

As an aside, Opus 4.5 is the first model I used that most of the time doesn't produce much slop, in case you haven't tried it. Still produces some slop, but not much human required for building things (it's mostly higher level and architectural things they need guidance on).

fragmede 3 hours ago | parent [-]

> it's mostly higher level and architectural things they need guidance on

Any examples you can share?

roberttod 2 hours ago | parent [-]

Mostly, it's not the model that is lacking but the visibility it has. Often the top level business context for a problem is out of reach, spread across slack, email, internal knowledge and meetings.

Once I digest some of this and give it to Claude, it's mostly smooth sailing but then the context window becomes the problem. Compactions during implementation remove a lot of important info. There should really be a Claude monitoring top level context and passing work to agents. I'm currently figuring out how to orchastrate that nicely with Claude Code MD files.

With respect to architecture, it generally makes sound decisions but I want to tweak it, often trading off simplicity vs. security and scale. These decisions seem very subtle and likely include some personal preferences I haven't written anywhere.