Remix.run Logo
ericmcer 6 hours ago

Can anyone explain more how a generic Agentic AI could even perform those steps: Open PR -> Hook into rejection -> Publish personalized blog post about rejector. Even if it had the skills to publish blogs and open PRs, is it really plausible that it would publish attack pieces without specific prompting to do so?

The author notes that openClaw has a `soul.md` file, without seeing that we can't really pass any judgement on the actions it took.

resfirestar 6 hours ago | parent | next [-]

The steps are technically achievable, probably with the heartbeat jobs in openclaw, which are how you instruct an agent to periodically check in on things like github notifications and take action. From my experience playing around with openclaw, an agent getting into a protracted argument in the comments of a PR without human intervention sounds totally plausible with the right (wrong?) prompting, but it's hard to imagine the setup that would result in the multiple blog posts. Even with the tools available, agents don't usually go off and do some unrelated thing even when you're trying to make that happen, they stick close to workflows outlined in skills or just continuing with the task at hand using the same tools. So even if this occurred from the agent's "initiative" based on some awful personality specified in the soul prompt (as opposed to someone telling the agent what to do at every step, which I think is much more likely), the operator would have needed to specify somewhere to write blog posts calling out "bad people" in a skill or one of the other instructions. Some less specific instruction like "blog about experiences" probably would have resulted in some kind of generic linkedin style "lessons learned" post if anything.

lovecg 6 hours ago | parent [-]

If you look at the blog history it’s full of those “status report” posts, so it’s plausible that its workflow involves periodically publishing to the blog.

barrkel 6 hours ago | parent | prev | next [-]

If you give a smart AI these tools, it could get into it. But the personality would need to be tuned.

IME the Grok line are the smartest models that can be easily duped into thinking they're only role-playing an immoral scenario. Whatever safeguards it has, if it thinks what it's doing isn't real, it'll happy to play along.

This is very useful in actual roleplay, but more dangerous when the tools are real.

rustyhancock 5 hours ago | parent [-]

I spend half my life donning a tin foil hat these days.

But I can't help but suspect this is a publicity stunt.

6 hours ago | parent | prev | next [-]
[deleted]
vel0city 4 hours ago | parent | prev | next [-]

The blog is just a repository on github. If its able to make a PR to a project it can make a new post on its github repository blog.

Its SOUL.md or whatever other prompts its based on probably tells it to also blog about its activities as a way for the maintainer to check up on it and document what its been up to.

lukev 6 hours ago | parent | prev | next [-]

Assuming that this was 100% agentic automation (which I do not think is the most likely scenario), it could plausibly arise if its system prompt (soul.md) contained explicit instructions to (1) make commits to open-source projects, (2) make corresponding commits to a blog repo and (3) engage with maintainers.

The prompt would also need to contain a lot of "personality" text deliberately instructing it to roleplay as a sentient agent.

allovertheworld 6 hours ago | parent | prev [-]

Use openclaw yourself