Remix.run Logo
zarzavat 2 days ago

> If you have a sophisticated agent system that uses multiple forward and backward passes, the quality improves tremendously.

Just an hour ago I asked Claude to find bugs in a function and it found 1 real bug and 6 hallucinated bugs.

One of the "bugs" it wanted to "fix" was to revert a change that I had made previously to fix a bug in code it had written.

I just don't understand how people burning tokens on sophisticated multi-agent systems are getting any value from that. These LLMs don't know when they are doing something wrong, and throwing more money at the problem won't make them any smarter. It's like trying to build Einstein by hiring more and more schoolkids.

Don't get me wrong, Claude is a fantastic productivity boost but letting it run around unsupervised would slow me down rather than speed me up.