Remix.run Logo
overgard 9 hours ago

I've been thinking of things I'd want an agent for recently. The problem is, everything I think of is something that requires using quite a few different websites, saving a lot of data securely, and working with a lot of sensitive accounts (my email, etc.)

The problem is, all the tasks are essentially: a) things agents probably just can't do, and b) things that absolutely cannot afford to be hallucinated or otherwise fucked up. So far the tasks I've thought of:

- Taxes. So it needs a lot of sensitive information to get W2's. Since I have to look up a lot of this stuff in the physical world anyway, it's not like I can just let it run wild.

- Background check for a new job. It took me 3 hrs to fill out one of them (mostly because the website was THAT bad). Being myself, I already was making mistakes just forgetting things like move in dates from 10 years ago, and having to do a lot of searching in my email for random documents. No way I'm trusting an agent with this.

- Setting up an LLC. Nope nope nope. There's a lot of annoying work involved with this, but I'm not trusting an LLM to do this.

Anyway, I guess my point is that even if an LLM was good at using my computer (so far, it seems like it wouldn't be), the kind of things I'd want an agent for are things that an LLM can't be trusted with.

peyton 9 hours ago | parent [-]

It’s great at

1. things you wouldn’t otherwise bother doing

2. things where it otherwise would get stuck iterating on hacky workarounds doomed to fail

“Reverse engineer this app/site so we can do $common_task in one click”, “by the way, I’m logged in to $developer_portal, so try @Browser Use if you’re stuck”, etc.

I just had Codex pull user flows out of a site I’m working on and organize them on a single page. It found 116. I went in and annotated where I wanted changes, and now it’s crunching away fixing them all. Then it’ll give me an updated contact sheet and I can do a second pass.

I’d never do this sort of quality pass manually and instead would’ve just fixed issues as they came up, but this just runs in the background and requires 15 minutes of my time for a lot of polish.

overgard 8 hours ago | parent [-]

I guess the problem I see here is that if the use case is "things I otherwise wouldn't bother doing", that's fine, but it's pretty niche. I dunno, if you're talking about a human "Agent" (like say in sports or entertainment), they'd be a trusted person to handle business matters outside of your competency (contract negotiations, etc.). I don't see AI "agents" being at all like that, they're more like an intern you need to supervise constantly.