Remix.run Logo
CuriouslyC 3 days ago

You're right, reviews aren't the way forward. We don't do code reviews on compiler output (unless you're writing a compiler). The way forward is strong static and analytic guardrails and stochastic error correction (multiple solutions proposed with LLM as a judge before implementation, multiple code review agents with different personas that have been prompted to be strict/adversarial but not nit-pick) with robust test suites that have also been through multiple passes of audits and red-teaming by agents. You should rarely have to look at the code, it should be a significant escalation event like when you need to coordinate with Apple due to XCode bugs.

JackSlateur 3 days ago | parent | next [-]

Static and analytic guardrails ??

Unless you are writing some shitty code for a random product that will be used for some demo then trashed, code can be resumed to a simple thing:

  Code is a way to move ideas into the real world through a keyboard
So, reading that the future is using a random machine with an averaged output (by design), but that this output of average quality will be good enough because the same random machine will generate tests of the same quality : this is ridiculous

Tests are probably the thing you should never build randomly, you should put a lot of thoughts in them: do they make sense ? Do your code make sense ? With tests, you are forced to use your own code, sometimes as your users will

Writing tests is a good way to force yourself to be empathic with your users

People that are coding through IA are the equivalent of the pre-2015 area system administrators that renewed TLS certificates manually. They are people that can (and are replacing themselves) with bash scripts. I don't miss them and I won't miss this new kind.

CuriouslyC 3 days ago | parent [-]

I actually have a bayesian stochastic process model for LLM codegen that incorporates the noisy channel coding theorem, it turns out that just like noisy communications channels can be encoded to give arbitrarily low error rate communication, LLM agents workflows can be coded to give arbitrarily low final error rate output. The only limitation on this is when model priors are highly mis-aligned with the work that needs to be done, in that case you need hard steering via additional context.

JackSlateur 3 days ago | parent [-]

Which model gives you creatives outputs ?

CuriouslyC 3 days ago | parent [-]

Creative outputs start with gemini (because of long context support, it can get longform stuff right), with successive refinement passes using claude for line/copy edits because (it's the least purple).

lelanthran 3 days ago | parent | prev | next [-]

> You should rarely have to look at the code, it should be a significant escalation event

This is the bit I am having problems with: if you are rarely looking at the code, you will never have the skills to actually debug that significant escalation event.

dingnuts 3 days ago | parent | prev [-]

good fucking luck writing adequate test suites for qualitative business logic

if it's even possible it will be more work than writing the code manually