Remix.run Logo
bcrosby95 5 days ago

I don't think it's that simple.

Fundamentally, unit tests are using the same system to write your invariants twice, it just so happens that they're different enough that failure in one tends to reveal a bug in another.

You can't reasonably state this won't be the case with tools built for code review until the failure cases are examined.

Furthermore a simple way to help get around this is by writing code with one product while reviewing the code with another.

jmull 5 days ago | parent [-]

> unit tests are using the same system to write your invariants twice

For unit tests, the parts of the system that are the same are not under test, while the parts that are different are under test.

The problem with using AI to review AI is that what you're checking is the same as what you're checking it with. Checking the output of one LLM with another brand probably helps, but they may also have a lot of similarities, so it's not clear how much.

adastra22 3 days ago | parent | next [-]

> The problem with using AI to review AI is that what you're checking is the same as what you're checking it with.

This isn't true. Every instantiation of the LLM is different. Oversimplifying a little, but hallucination emerges when low-probability next words are selected. True explanations, on the other hand, act as attractors in state-space. Once stumbled upon, they are consistently preserved.

So run a bunch of LLM instances in parallel with the same prompt. The built-in randomness & temperature settings will ensure you get many different answers, some quite crazy. Evaluate them in new LLM instances with fresh context. In just 1-2 iterations you will hone in on state-space attractors, which are chains of reasoning well supported by the training set.

Demiurge 5 days ago | parent | prev | next [-]

What if you use a different AI model? Sometimes just a different seed generates a different result. I notice there is a benefit to seeing and contrasting the different answers. The improvement is gradual, it’s not a binary.

adastra22 4 days ago | parent [-]

You don't need to use a different model, generally. In my experience a fresh context window is all you need, the vast majority of the time.

bcrosby95 4 days ago | parent | prev [-]

The system is the human writing the code.