Remix.run Logo
benjiro 3 days ago

That is the thing ... How long ago did we get Agent mode. Like in CoPilot that thing is only 7 months old.

Things evolve faster then people realize... Agent mode, then came mcp servers, sub agents, now its rag databases allowing the LLMs to get data directly.

The development of LLMS looks slow but with each iteration, things get improved. As yourself, what will have been the result of those same tests you ran, 21 months ago, with Claude 3.0? How about Claude 4.0, that is only 8 months ago.

Right now Opus 4.5 is darn functional. The issue is more often not the code that it write, but more often it get stuck on "its too complex, let me simplify it", with the biggest issue often being context capacity.

LLMs are still bad at deeper tasks, but compared to the last LLMs, the jumps have been enormous. What about a year from now? Two years? I have a hard time believing that Claude 3 was not even 2 years but just 21 month ago. And we considered that a massive jump up, useful for working on a single file... Now we are throwing it entire codebases and is darn good at debugging, editing etc.

Do i like the results? No, there are lots of times that the results are not what "i wanted", but that is often a result of my own prompting being too generic.

LLMs are never going to really replace experience programmers, but boy is the progress scary.

aeonfox 3 days ago | parent [-]

I can't say my opinion has changed. It didn't give me results that more exciting or useful than Sonnet. Is it worth 3x price per token? I'm not so sure.

(It wasn't clear in my comment, but I already use agents for my code. I just think the OPs claims are overblown.)