Remix.run Logo
dzink 11 hours ago

Opus 4.6 is AGI in my book. They won’t admit it, but it’s absolutely true. It shows initiative in not only getting things right but also adding improvements that the original prompt didn't request that match the goals of the job.

prmph 3 hours ago | parent | next [-]

> Opus 4.6 is AGI in my book.

Not even close. There are still tons of architectural design issues that I'd find it completely useless at, tons of subtle issues it won't notice.

I never run agents by themselves; every single edit they do is approved by me. And, I've lost track of the innumerable times I've had to step in and redirect them (including Opus) to an objectively better approach. I probably should keep a log of all that, for the sake of posterity.

I'll grant you that for basic implementation of a detailed and well-specced design, it is capable.

winrid 11 hours ago | parent | prev | next [-]

On the adding improvements and being helpful thing, isn't that part of the system prompt?

dcre 10 hours ago | parent [-]

You could put whatever you wanted in the GPT-4 system prompt and it wasn't doing shit.

winrid 9 hours ago | parent [-]

True. I retract my sentiment :D

10 hours ago | parent | prev | next [-]
[deleted]
dyauspitr 10 hours ago | parent | prev [-]

I don’t know if Opus is AGI but on a broader note, that’s how we will get AGI. Not some consciousness like people are expecting. It’s just going to be chatbot that’s very hard to stump and starts making actual scientific breakthroughs and solving long standing problems.

unshavedyak 9 hours ago | parent [-]

I'll be more likely to agree with anything being AGI if it doesn't have such obvious and common brittleness. These LLMs all go off the rails when the context window gets large. Their context is also easy to "poison", and so it's better to rollback conversations that went bad rather than trying to steer them back to the light.

There's probably more examples, but to me AGI must move beyond the above issues. Though frankly context window might just be a symptom of poor harness than anything, still - it illustrates my general issue with them being considered AGI as it stands today.

Claude 4.6 is getting crazy good though, i'll give you that.

copperx 7 hours ago | parent [-]

How are you rolling back a conversation? I didn't know tools exposed that functionality.

NiloCK 6 hours ago | parent [-]

For both claude-code or gemini-cli, hit escape twice, or, /rewind.