new | show | ask | jobs Github

MeetingsBrowser 7 hours ago

The craziest thing about AI is you can just try it yourself and check if the claims are true.

I use Claude code and codex daily. They have become an integral part of my workflow.

There is no task that takes me a day that they can complete in five minutes.

Even with the lightning fast progress being made, it looks like LLMs are a decade or more away from being that good.

If AI can do your job for you, you should be the first to know. Just try it and see!

▲

johnfn 6 hours ago | parent | next [-]

There are definitely tasks you can prompt an AI in 5 minutes that would take a whole day to do. One example is adding something to a CI pipeline and getting it to green (i.e. maybe you're adding your first ever e2e test), especially when your CI pipeline is painfully slow. e.g. if your pipeline takes 30 minutes to finish, and it takes around 10 tries to figure out all the random problems, that was easily a full day task before AI. Now I prompt AI to figure it out, which takes 5 minutes of active attention, and it figures it out for the rest of the day while I do other stuff.

▲

rich_sasha 5 hours ago | parent | next [-]

People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true.

Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.

	▲	wild_egg 5 hours ago \| parent [-]
		Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it. This has completely solved the cheating and fudging to make tests pass for me.

▲

MeetingsBrowser 6 hours ago | parent | prev | next [-]

There are definitely some tasks that AI has made 10x or 100x faster, but not the tasks that make up my day to day.

For me, there may be one thing I do every few months that AI is really good at.

The overwhelming majority of the work I do, LLM tooling is just ok at. Definitely faster overall, but with lots of human planning, hand holding and course correction.

I would estimate LLMs make me, on average 50% more productive , which is huge! But from my experience I cannot believe anyone is experiencing a 8h/5m multiple productivity boost overall

▲

Aurornis 6 hours ago | parent | prev [-]

I mean I wasn’t sitting around unproductively waiting for 30 minute CI runs to finish before LLMs came along, either.

I also like to use LLMs for background work on iterative tasks, but the way some people talk about work in the days before LLMs make me realize how we’re arriving at these claims that LLMs make us 10X more productive. If it took someone all day to do a few minutes of active work then I could see how LLMs would feel like a 10X or 50X productivity unlocker simply by not shutting down and doing nothing at the first sign of a pause.

▲

johnfn 6 hours ago | parent [-]

Count yourself as one of the lucky few that can pay a 0 minute context switching price to switch between whatever other productive work you were doing and debugging CI. Most people I speak to remark that continually switching between unrelated tasks significantly diminishes their productivity.

	▲	Aurornis 6 hours ago \| parent [-]
		The example above was talking about 30 minute wait times between being able to do work. Nobody is staring at the screen for 30 minutes in deep concentration while they wait for that turn to complete. They are context switching to something, but maybe it’s Hacker News or Reddit. There is always a context switch in scenarios like this.

▲

qudat 7 hours ago | parent | prev | next [-]

Fundamentally it cannot be much better than how well we can write the spec and then validate the results.

It’s always gonna be a multi shot process. And it can already write code good enough. That’s no longer the bottleneck.

Further, Qwen 27b is such an incredible masterpiece for coding and it can run on consumer hardware today. Anthropic/OpenAI are gonna give up on coding models very soon. There’s not gonna be any money in it when you can run your own local model for significantly cheaper.

Qwen27b is not SOTA but the value is insane. You can basically use it for small tasks and then route harder problems to opus or sonnet and boom you’ve said a lot of money.

▲

HeavyStorm 6 hours ago | parent | prev | next [-]

Not my experience. AI takes a lot less time doing tasks than myself. My current issue is that 2 out of 3 they don't produce the code that I want, so I either have to reprompt or do it myself. And the solution is simple: just accept their way; I'm just not there yet.

In any case, on that one time that AI works perfectly, it saves me hours of coding. So the potential is there...

▲

2ndorderthought 7 hours ago | parent | prev | next [-]

Super trivial to hand verify 350kloc changes for sure.

	▲	qayxc 6 hours ago \| parent [-]
		Quis custodiet ipsos custodes?

▲

muglug 5 hours ago | parent | prev | next [-]

For select tasks the latest LLMs can speed things up by an order of magnitude.

Best example I’ve found: translating code from one language to another where there’s a large corpus of existing acceptance tests.

	▲	casey2 4 hours ago \| parent [-]
		Again, no silver bullet. You will have to know what tasks it's capable of and how to elicit that solution. The bottleneck was never code the bottleneck still is solving the right problem in the right way.

▲

whstl 6 hours ago | parent | prev | next [-]

Yep. It depends so much on task, expectations, ability to express what you want and whether the problem has been solved elsewhere or not.

The results are always so ridiculously different.

	▲	lelanthran 6 hours ago \| parent [-]
		> The results are always so ridiculously different. Well... yes! It's not the same as running a program through a compiler 100k times and getting the same binary, it's... different: https://www.lelanthran.com/chap15/content.html

▲

oulipo2 5 hours ago | parent | prev | next [-]

When you do any meaningful work, that is, not "generate a website with a fancy UI", you very much realize that AI can not, in fact, "do the work". They constantly make mistakes, and you have to spend about as much time writing the spec and checking the code as you'd have written the code

So the effect is just merely some kind of acceleration of "boilerplate code writing", which is very impressive for beginner coders who are mostly doing automateable, trivial tasks, but much less so once you start doing real concurrency / threading / embedded / etc work

▲

Aeolun 6 hours ago | parent | prev | next [-]

> There is no task that takes me a day that they can complete in five minutes.

Five minutes is pushing it, but 15 minutes? Absolutely.

▲

MattGaiser 6 hours ago | parent | prev | next [-]

The delta isn't a day to 5 minutes, but a day to a half hour (where most of my larger tickets take)? Yes, especially as you don't need to watch it do its thing anymore.

To me, the lack of amazing productivity gains is that we have done nothing to speed up figuring out what to build and nothing to speed up getting code into production from pull request and in a lot of companies, code review is already saturated.

Also, the agents are good at figuring out problems for themselves, so I can ask it to set up a CI/CD pipeline, give it GitHub access, and it will just try things until it succeeds.

▲

potsandpans 4 hours ago | parent | prev | next [-]

Agreed with you. Non argumentative, just want to add to the convo: what's even more crazy is the cognitive dissonance around this idea.

> There is no task that takes me a day that they can complete in five minutes.

It's highly dependent on task. I was watching a podcast with Simon Wilson, where he said something like, (paraphrasing) "My whole selling point as a dev was that I could ship POCs / MVPs fast. Now that's somewhat obsolete."

It resonated with me because I feel like that also was a skill that I cultivated and excelled at. I agree with Simon's general thesis: that skill is largely dead. There are many pedants and detractors that will race to the defense of this art with various arguments to try to challenge the idea, but they simply do not hold up to reality. I have non-programmer friends with 10 dollar claude code subscriptions whipping up products to solve niche problems in their life / job.

I offered to help one of my friends who's working on generating math exams based on curricula and seed problem sets. I taught him how to use git, he pushed the repo, I looked at the repo and it wasn't clear he needed me. Everything I could do would be related to scale / reliability / optimization. They don't need any of that, they just need to prompt the ai to say, "go burn some subscription tokens for my AP Calc track this year." There's a whole saas and c2c industry built around this problem that this guy just solved for 10 bucks a month.

Of course, there's much more depth to engineering then just cranking out prototypes. There is still "real engineering" to be done, and software will likely start focusing more towards specification / verification.

But a lot of the industry was built around the idea of speed of delivery / time to market to explore product fit and rapidly iterate. IMO frontier llms (private and open weights) have this largely solved. I can build and test ideas that would have taken me a weekend last year in half a day now, the majority of that time I can be talking to the llm via matrix while I'm out in the world.

▲

soupspaces 7 hours ago | parent | prev [-]

[dead]