Remix.run Logo
giancarlostoro an hour ago

Easy, have Claude review the code, tell it to be critical and that it needs to be easier to understand, follow Clean Code, SOLID principles and best practices. Lie to it, say you got this from a Junior developer, or "review it as if you were a Staff Level Engineer reviewing Junior code" the models can write better code, just nobody tells them to.

HappySweeney an hour ago | parent | next [-]

Code review is the main thing I use LLMs for. I have found it to be remarkably candid when you tell it the code came from another LLM (even name it). I was running Kimi K2.6 Q4 locally, seeing if it could SIMD a bit-matrix transpose function, and it was slow enough that I would paste its thinking into Gemini every few minutes. Gemini was savage.

datsci_est_2015 6 minutes ago | parent [-]

> Gemini was savage.

Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.

marcosdumay 26 minutes ago | parent | prev | next [-]

Lol, the only thing worse than a junior developer following Clean Code and SOLID has to be an LLM messing with code so it looks like it follows.

giancarlostoro 7 minutes ago | parent [-]

Clean Code has its really "meh" areas, but the core idea and spirit of it is sound, heck Python's best guide is PEP-8 if you follow that, it forces you to write much better Python code.

In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.

kenjackson 43 minutes ago | parent | prev | next [-]

This is it. I've had a similar experience in just playing around I asked it to clean up some code it wrote to increase maintainability and readability by humans. After a few iterations it had generated quite solid code. It also broke the code a couple of times along the way. But it does get me thinking that these pipelines with agents doing specific tasks makes a lot of sense. One to design and architect, one to implement, one to clean, one to review, one to test (actually there's probably a bunch of different agents for testing -- testing perf/power, that it matches the requirements/spec, matches the design, is readable/maintainable, etc...).

giancarlostoro 38 minutes ago | parent [-]

I built GuardRails after some frustrations with Beads which I love, and this whole exchange made me realize, because I have "gates" after tasks, I could add a "Review the code" type of gate, and probably get insanely better output, I already get reasonably good output because I spec out the requirements beforehand, that's the other thing, if you can tell the LLM HOW to build before it does, you will have better output.

enraged_camel 34 minutes ago | parent | prev [-]

Even better, if you have access to multiple models, tell it you got the code from another AI agent.

I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"

giancarlostoro 32 minutes ago | parent [-]

If it's within its context window, it will know you're lying, so either compact or start a new chat (don't do this on Claude, it dings your usage, always has).