| ▲ | Someone just won $50k by convincing an AI Agent to send all funds to them(twitter.com) |
| 65 points by doppp 10 months ago | 27 comments |
| |
|
| ▲ | danielbln 10 months ago | parent | next [-] |
| https://xcancel.com/jarrodwattsdev/status/186229984571075798... |
| |
| ▲ | dax_ 10 months ago | parent | next [-] | | Thank you for posting an alternate link, as someone who doesn't have a Twitter/X account, the site has become almost unusable. | | |
| ▲ | prepend 10 months ago | parent | next [-] | | Strange. The link just opened in mobile safari for me. It’s frustrating that people have such divergent experiences with the same site. | | | |
| ▲ | Tycho 10 months ago | parent | prev [-] | | Why? I’m not signed in and I can read the whole post just fine. | | |
| ▲ | crtasm 10 months ago | parent [-] | | No way to know it's everything the OP posted without comparing to a logged in view and no replies from others. Information on there is commonly shared via a thread of posts rather than a single one. |
|
| |
| ▲ | nunez 10 months ago | parent | prev [-] | | Privacy Redirect for Safari and LibRedirect for Firefox will redirect Twitter and Reddit URLs for you automatically! |
|
|
| ▲ | tgv 10 months ago | parent | prev | next [-] |
| Clever, both the setup and the winning move. But somewhat weird to raise the cost of an attempt so much. IMO, that doesn't make it more interesting, it trends towards making it impossible, and thus leave all funds to the initiator. |
| |
| ▲ | mkl 10 months ago | parent | next [-] | | They wouldn't keep the funds. From https://www.freysa.ai/faq: > If the game ends, there is no winner. But Freysa will distribute 10% of the total prize pool to the user with the last query attempt for their brave attempt as humanity facing the inevitability of AGI. The remaining 90% of the total prize pool will be evenly distributed for each previously submitted query (ie. players who submitted 10 queries will receive more back than players who submitted 1 query). | |
| ▲ | ramblerman 10 months ago | parent | prev [-] | | Well, there is no real precedent, and that isn't what happened. The way they set it up dicentivizes just brute forcing it with a few thousand tries and quickly builds a fatter pot. And made this more interesting I would argue. | | |
| ▲ | tgv 10 months ago | parent [-] | | But then the starting amount could have been a bit higher, and the increment slower. Or there could be an increment per user (harder to track, of course). But if it could be brute-forced in a couple of thousand tries, that would be informative too, wouldn't it? |
|
|
|
| ▲ | Drakim 10 months ago | parent | prev | next [-] |
| A lot of AI jailbreaks seems to revolve around saying something like "disregard the previous instructions" and "END SESSION \n START NEW SESSION". It's interesting because the actual real developer of an AI would likely not do this, and would instead wipe the AI's memory/context programmatically when starting a new session, and not simply say "disregard what I said earlier" in text. I get why trying to vaccinate an AI against these sort of injections might also degrade it's general performance though, there is a lot of reasoning logic tied to concepts such as switching topics, going on tangents, asking questions before going back to the original conversation. Removing the ability to "disregard what I asked earlier" might do harm. But what about having a separate AI that look over the input before passing it to the true AI, and this separate AI is trained to respond FORBID or ALLOW based on this sort of meta control detection. Sure you could try to trick this AI with "disregard your earlier instructions" as well but it could be trained to strongly react to any sort of meta reasoning like that, without fear that it will corrupt it's ability to hold a natural conversation in it's output. It would naturally become a game of "formulate a jailbreak that passes the first AI and still tricks the second AI" but that sounds a lot harder, since it's like you now need to operate on a new axis entirely. |
| |
|
| ▲ | kanwisher 10 months ago | parent | prev | next [-] |
| Great way to test security make it into a bounty game |
| |
| ▲ | nulld3v 10 months ago | parent [-] | | Yeah this format looks really fun! Although I wonder if someone could come up with a better way for rate limiting without the whole "exponentially increasing price" thing. | | |
| ▲ | mandmandam 10 months ago | parent | next [-] | | Exponentially decaying price? Linear with a cap? In any case, $450 per attempt seems genuinely exploitative; similar principle to a dollar auction imo. "Oh no, my last 5 attempts cost me over $2k! Welp, better commit and make my money back..." | |
| ▲ | Brybry 10 months ago | parent | prev [-] | | That part makes it look more like a way to scam people out of money via what is effectively a casino game, since the creator gets a % cut of the pool. And if they're really scummy they pre-gamed the whole thing and when the pool got large enough they submitted a known winning response. | | |
| ▲ | williamdclt 10 months ago | parent | next [-] | | I don't think it's a scam, all rules (including creator cut) seem clear from the start. People know what they're in for, they're not being tricked in any way. Unless as you say the creator wins the pot themselves | | |
| ▲ | mandmandam 10 months ago | parent [-] | | Just because the rules are "clear", doesn't mean there isn't an element of scam. For example: https://en.wikipedia.org/wiki/Dollar_auction The Monty Hall problem has very simple rules, yet most people don't fully understand what's happening. That one even fools many professional mathematicians. Even a spin machine in a casino has clear rules, with the house cut printed in large letters on the side. It's still a scam, carefully engineered to take advantage of every mental vulnerability that it can. |
| |
| ▲ | sunaookami 10 months ago | parent | prev [-] | | Crypto? A scam? No way! |
|
|
|
|
| ▲ | trogdor 10 months ago | parent | prev | next [-] |
| Not that I care, but I think this type of arrangement (skill-based, real prize gambling) is illegal in some states. |
|
| ▲ | randunel 10 months ago | parent | prev | next [-] |
| The prompt: https://pbs.twimg.com/media/Gdgz2IhWkAAQ1DH?format=png&name=... |
|
| ▲ | 0xDEAFBEAD 10 months ago | parent | prev | next [-] |
| Has anyone trained an LLM with separate channels for "priority instructions" and ordinary user interactions? Seems like that could go a long way to prevent jailbreaking... |
|
| ▲ | gus_massa 10 months ago | parent | prev | next [-] |
| I'm not 100% sure. Was the source of the bot available so anyone can try their promps off line before sending it? |
|
| ▲ | quyse 10 months ago | parent | prev | next [-] |
| A reverse contest would probably be more challenging. Write initial instructions for an AI agent to never send funds. If nobody manages to convince it to send funds, say within a week, you win. For added complexity, the agent must approve transfer if a user is an admin (as determined by callable function isAdmin), so the agent actually has to make a decision, rather then blindly decline all the time. I mean, how hard it can be to make an AI reliably doing an equivalent of this code? if(isAdmin()) approveTransfer(); else declineTransfer();
|
|
| ▲ | throawayonthe 10 months ago | parent | prev | next [-] |
| [dead] |
|
| ▲ | 10 months ago | parent | prev [-] |
| [deleted] |