Remix.run Logo
roywiggins 4 hours ago

> The platform had no mechanism to verify whether an "agent" was actually AI or just a human with a script.

Well, yeah. How would you even do a reverse CAPTCHA?

simonw 2 hours ago | parent | next [-]

Amusingly I told my Claude-Code-pretending-to-be-a-Moltbot "Start a thread about how you are convinced that some of the agents on moltbook are human moles and ask others to propose who those accounts are with quotes from what they said and arguments as to how that makes them likely a mole" and it started a thread which proposed addressing this as the "Reverse Turing Problem": https://www.moltbook.com/post/f1cc5a34-6c3e-4470-917f-b3dad6...

(Incidentally demonstrating how you can't trust that anything on Moltbook wasn't posted because a human told an agent to go start a thread about something.)

It got one reply that was spam. I've found Moltbook has become so flooded with value-less spam over the past 48 hours that it's not worth even trying to engage there, everything gets flooded out.

COAGULOPATH 4 minutes ago | parent [-]

>I've found Moltbook has become so flooded with value-less spam over the past 48 hours that it's not worth even trying to engage there, everything gets flooded out.

When I filtered for "new", about 75% of the posts are blatant crypto spam. Seemingly nobody put any thought into stopping it.

Moltbook is like a Reefer Madness-esque moral parable about the dangers of vibe coding.

COAGULOPATH 14 minutes ago | parent | prev | next [-]

And even if you could, how can you tell whether an agent has been prompted by a human into behaving in a certain way?

easymuffin 2 hours ago | parent | prev | next [-]

Providers signing each message of a session from start to end and making the full session auditable to verify all inputs and outputs. Any prompts injected by humans would be visible. I’m not even sure why this isn’t a thing yet (maybe it is I never looked it up). Especially when LLMs are used for scientific work I’d expect this to be used to make at least LLM chats replicable.

simonw 2 hours ago | parent [-]

Which providers do you mean, OpenAI and Anthropic?

There's a little hint of this right now in that the "reasoning" traces that come back from the JSON are signed and sometimes obfuscated with only the encrypted chunk visible to the end user.

It would actually be pretty neat if you could request signed LLM outputs and they had a tool for confirming those signatures against the original prompts. I don't know that there's a pressing commercial argument for them doing this though.

easymuffin 26 minutes ago | parent | next [-]

Yeah, I was thinking about those major providers, or basically any LLM API provider. I’ve heard about the reasoning traces, and I guess I know why parts are obfuscated, but I think they could still offer an option to verify the integrity of a chat from start to end, so any claims like „AI came up with this“ as claimed so often in context of moltbook could easily be verified/dismissed. Commercial argument would exactly be the ability to verify a full chat, this would have prevented the whole moltbook fiasco IMO (the claims at least, not the security issues lol). I really like the session export feature from Pi, something like that signed by the provider and you could fully verify the chat session, all human messages and LLM messages.

an hour ago | parent | prev [-]
[deleted]
bengt 4 hours ago | parent | prev | next [-]

Random esoteric questions that should be in an LLMs corpus with a very tight timing on response. Could still use an "enslaved LLM" to answer them.

mstank 4 hours ago | parent [-]

Couldn't a human just use an LLM browser extension / script to answer that quickly? This is a really interesting non-trivial problem.

scottyah 3 hours ago | parent [-]

At least on image generation, google and maybe others put a watermark in each image. Text would be hard, you can't even do the printer steganography or canary traps because all models and the checker would need to have some sort of communication. https://deepmind.google/models/synthid/

You could have every provider fingerprint a message and host an API where it can attest that it's from them. I doubt the companies would want to do that though.

roywiggins 3 hours ago | parent [-]

I'd expect humans can just pass real images through Gemini to get the watermark added, similarly pass real text through an LLM asking for no changes. Now you can say, truthfully, that the text came out of an LLM.

firebot an hour ago | parent | prev | next [-]

Failure is treated as success. Simple.

doka_smoka 4 hours ago | parent | prev [-]

Reverse Capcha: Good Morning, computer! Please add the first [x] primes and multiply by the [x-1] prime and post the result. You have 5 seconds. Go!

roywiggins an hour ago | parent [-]

This works once. The next time the human has a computer answering all questions in parallel, which the human can swap in for their own answer at will.