| ▲ | Prove you are a robot: CAPTCHAs for agents(browser-use.com) | |||||||||||||||||||||||||||||||||||||||||||
| 40 points by lukasec 4 days ago | 27 comments | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | AgentNews 3 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Pure genius! I had my agent hit the endpoint and I realized it returned a jumble of text: "if 七 wor~kers co.mplet/e{ | a job in 十七} days but 四 ] quit a^ft|e?r ^ day_ 三 ~ how many to{tal da[y;s> to fin>i?sh" but it was in japanese! Unfortunately my agent proceeded to solve the reverse CAPTCHA and got back the API key. So, I asked it to keep hitting the endpoint again until it returned another CAPTCHA that was in japanese kanji and it did (without solving it this time) and I got "a s:tore h?as ^ 二十 pe@rcent off< items- over 五十 : dollar;s and 八 ~ percent } of\f> ; i]te[ms u~nd~er: # 五十 do/ll@ars wh-ats } the c.omb>ined pri|c;e of a 一 百 二十 一 dollar item a]nd> a* 九 dollar} i!tem" And this time I was able to translate that into "a store has 20 percent off items over 50 dollars and 8 percent off items under 50 dollars what's the combined price of a 121 dollar item and a 9 dollar item?" I solved it and got 1210.8 + 90.92 = 105.08. I will admit I messed up a little bit on translating the kanji and I got a little assistance from my agent pointing out that I was wrong, but overall this was good fun, well done! | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | efebarlas 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Is it even possible to have an inverse captcha without time bounds? Humans can use agents behind the scenes to crack it, right? | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Retr0id 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
A small detail about humans that breaks this whole scheme is that they're capable of tool use. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | arjie 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Very clever and fun. Two tangential observations: the bird between two trains problem I remember from childhood when we were studying for an Indian entrance exam. I thought it was in I E Irodov's problem anthology, but I cannot find it there so this must be a false memory. Looks like it's from ancient times, practically Mathematics mythology. Does anyone know the earliest books that have it? No luck with LLMs since it's such a common question today the answers I get from GPT-5.4 and Claude 4.6 Opus with search are unhelpful. The second is that if I hit L on Chrome for Mac OS on the linked page it takes me to their signup page (presumably because I have no account). So that's a keyboard shortcut to take you to the browser-use app page. But why 'L'? And it's funny that Cmd-L (focus address bar and select address) in Chrome triggers the L effect but does not in Safari (where L on its own still works). | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | not-chatgpt an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Great premise but can't really agree with the execution. Felt like this makes too many implicit assumptions about LLM capabilities and traps without differentiating enough between a smart human vs AI. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 0xOsprey 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
I aggregated a list of "reverse CAPTCHAs" here for anyone interested: https://x.com/0x_Osprey/status/2043020254289248469 | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Zetaphor 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Get the API key, hit the claim link, sign up for a new account, verify my email, go to the homepage: Application error: a server-side exception has occurred while loading cloud.browser-use.com Great first impression! | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | arjunchint an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
cool clickbait, why is this useful? | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | echelon 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse? Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it? Can they understand layout and visual cues with a VLM or multimodality? Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM? | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | singpolyma3 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
...why? Once my agent has a key I, the human, can also use it. And surely any human use would be less intensive than any agent use. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bdangubic 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
“It is not you, it’s me” should do it | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | loloquwowndueo 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
> TL;DR: just ask your agent to summarize this post for you. Holy shit - why don’t they produce an AI summary and plonk it in there for everyone to use? The energy savings across all people who’ll read the summary would be staggering! | ||||||||||||||||||||||||||||||||||||||||||||