Remix.run Logo
shepherdjerred 8 hours ago

I really don't understand why there are so many comments like this.

Yesterday I had Claude write an audit logging feature to track all changes made to entities in my app. Yeah you get this for free with many frameworks, but my company's custom setup doesn't have it.

It took maybe 5-10 minutes of wall-time to come up with a good plan, and then ~20-30 min for Claude implement, test, etc.

That would've taken me at least a day, maybe two. I had 4-5 other tasks going on in other tabs while I waited the 20-30 min for Claude to generate the feature.

After Claude generated, I needed to manually test that it worked, and it did. I then needed to review the code before making a PR. In all, maybe 30-45 minutes of my actual time to add a small feature.

All I can really say is... are you sure you're using it right? Have you _really_ invested time into learning how to use AI tools?

tyleo 8 hours ago | parent | next [-]

Same here. I did bounce off these tools a year ago. They just didn't work for me 60% of the time. I learned a bit in that initial experience though and walked away with some tasks ChatGPT could replace in my workflow. Mainly replacing scripts and reviewing single files or functions.

Fast forward to today and I tried the tools again--specifically Claude Code--about a week ago. I'm blown away. I've reproduced some tools that took me weeks at full-time roles in a single day. This is while reviewing every line of code. The output is more or less what I'd be writing as a principal engineer.

delusional an hour ago | parent [-]

> The output is more or less what I'd be writing as a principal engineer.

I certainly hope this is not true, because then you're not competent for that role. Claude Code writes an absolutely incredible amount of unecessary and superfluous comments, it's makes asinine mistakes like forgetting to update logic in multiple places. It'll gladly drop the entire database when changing column formats, just as an example.

jamesmcq 8 hours ago | parent | prev | next [-]

Trust me I'm very impressed at the progress AI has made, and maybe we'll get to the point where everything is 100% correct all the time and better than any human could write. I'm skeptical we can get there with the LLM approach though.

The problem is LLMs are great at simple implementation, even large amounts of simple implementation, but I've never seen it develop something more than trivial correctly. The larger problem is it's very often subtly but hugely wrong. It makes bad architecture decisions, it breaks things in pursuit of fixing or implementing other things. You can tell it has no concept of the "right" way to implement something. It very obviously lacks the "senior developer insight".

Maybe you can resolve some of these with large amounts of planning or specs, but that's the point of my original comment - at what point is it easier/faster/better to just write the code yourself? You don't get a prize for writing the least amount of code when you're just writing specs instead.

hathawsh 27 minutes ago | parent | next [-]

Several months ago, just for fun, I asked Claude (the web site, not Claude Code) to build a web page with a little animated cannon that shoots at the mouse cursor with a ballistic trajectory. It built the page in seconds, but the aim was incorrect; it always shot too low. I told it the aim was off. It still got it wrong. I prompted it several times to try to correct it, but it never got it right. In fact, the web page started to break and Claude was introducing nasty bugs.

More recently, I tried the same experiment, again with Claude. I used the exact same prompt. This time, the aim was exactly correct. Instead of spending my time trying to correct it, I was able to ask it to add features. I've spent more time writing this comment on HN than I spent optimizing this toy. https://claude.ai/public/artifacts/d7f1c13c-2423-4f03-9fc4-8...

My point is that AI-assisted coding has improved dramatically in the past few months. I don't know whether it can reason deeply about things, but it can certainly imitate a human who reasons deeply. I've never seen any technology improve at this rate.

fourthark 8 hours ago | parent | prev | next [-]

This is exactly what the article is about. The tradeoff is that you have to throughly review the plans and iterate on them, which is tiring. But the LLM will write good code faster than you, if you tell it what good code is.

reg_dunlop 7 hours ago | parent [-]

Exactly; the original commenter seems determined to write-off AI as "just not as good as me".

The original article is, to me, seemingly not that novel. Not because it's a trite example, but because I've begun to experience massive gains from following the same basic premise as the article. And I can't believe there's others who aren't using like this.

I iterate the plan until it's seemingly deterministic, then I strip the plan of implementation, and re-write it following a TDD approach. Then I read all specs, and generate all the code to red->green the tests.

If this commenter is too good for that, then it's that attitude that'll keep him stuck. I already feel like my projects backlog is achievable, this year.

fourthark 6 hours ago | parent [-]

Strongly agree about the deterministic part. Even more important than a good design, the plan must not show any doubt, whether it's in the form of open questions or weasel words. 95% of the time those vague words mean I didn't think something through, and it will do something hideous in order to make the plan work

Kiro an hour ago | parent | prev | next [-]

> but I've never seen it develop something more than trivial correctly.

What are you working on? I personally haven't seen LLMs struggle with any kind of problem in months. Legacy codebase with great complexity and performance-critical code. No issue whatsoever regardless of the size of the task.

nojito 8 hours ago | parent | prev [-]

>I've never seen it develop something more than trivial correctly.

This is 100% incorrect, but the real issue is that the people who are using these llms for non-trivial work tend to be extremely secretive about it.

For example, I view my use of LLMs to be a competitive advantage and I will hold on to this for as long as possible.

jamesmcq 8 hours ago | parent [-]

The key part of my comment is "correctly".

Does it write maintainable code? Does it write extensible code? Does it write secure code? Does it write performant code?

My experience has been it failing most of these. The code might "work", but it's not good for anything more than trivial, well defined functions (that probably appeared in it's training data written by humans). LLMs have a fundamental lack of understanding of what they're doing, and it's obvious when you look at the finer points of the outcomes.

That said, I'm sure you could write detailed enough specs and provide enough examples to resolve these issues, but that's the point of my original comment - if you're just writing specs instead of code you're not gaining anything.

cowlby 8 hours ago | parent | next [-]

I find “maintainable code” the hardest bias to let go of. 15+ years of coding and design patterns are hard to let go.

But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

Specs are worth it IMO. Not because if I can spec, I could’ve coded anyway. But because I gain all the insight and capabilities of AI, while minimizing the gotchas and edge failures.

Jweb_Guru an hour ago | parent | next [-]

> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms

I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.

girvo 5 hours ago | parent | prev [-]

> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

How do you square that with the idea that all the code still has to be reviewed by humans? Yourself, and your coworkers

cowlby 5 hours ago | parent [-]

I picture like semi conductors; the 5nm process is so absurdly complex that operators can't just peek into the system easily. I imagine I'm just so used to hand crafting code that I can't imagine not being able to peek in.

So maybe it's that we won't be reviewing by hand anymore? I.e. it's LLMs all the way down. Trying to embrace that style of development lately as unnatural as it feels. We're obv not 100% there yet but Claude Opus is a significant step in that direction and they keep getting better and better.

girvo 4 hours ago | parent [-]

Then who is responsible when (not if) that code does horrible things? We have humans to blame right now. I just don’t see it happening personally because liability and responsibility are too important

therealdrag0 2 hours ago | parent [-]

For some software, sure but not most.

And you don’t blame humans anyways lol. Everywhere I’ve worked has had “blameless” postmortems. You don’t remove human review unless you have reasonable alternatives like high test coverage and other automated reviews.

reg_dunlop 7 hours ago | parent | prev | next [-]

To answer all of your questions:

yes, if I steer it properly.

It's very good at spotting design patterns, and implementing them. It doesn't always know where or how to implement them, but that's my job.

The specs and syntactic sugar are just nice quality of life benefits.

jmathai 8 hours ago | parent | prev [-]

You’d be building blocks which compound over time. That’s been my experience anyway.

The compounding is much greater than my brain can do on its own.

skydhash 7 hours ago | parent | prev | next [-]

> Yesterday I had Claude write an audit logging feature to track all changes made to entities in my app. Yeah you get this for free with many frameworks, but my company's custom setup doesn't have it.

But did you truly think about such feature? Like guarantees that it should follow (like how do it should cope with entities migration like adding a new field) or what the cost of maintaining it further down the line. This looks suspiciously like drive-by PR made on open-source projects.

> That would've taken me at least a day, maybe two.

I think those two days would have been filled with research, comparing alternatives, questions like "can we extract this feature from framework X?", discussing ownership and sharing knowledge,.. Jumping on coding was done before LLMs, but it usually hurts the long term viability of the project.

Adding code to a project can be done quite fast (hackatons,...), ensuring quality is what slows things down in any any well functioning team.

streetfighter64 8 hours ago | parent | prev [-]

I mean, all I can really say is... if writing some logging takes you one or two days, are you sure you _really_ know how to code?

boxedemp 7 hours ago | parent | next [-]

Ever worked on a distributed system with hundreds of millions of customers and seemingly endless business requirements?

Some things are complex.

shepherdjerred 8 hours ago | parent | prev | next [-]

You're right, you're better than me!

You could've been curious and ask why it would take 1-2 days, and I would've happily told you.

jamesmcq 8 hours ago | parent [-]

I'll bite, because it does seem like something that should be quick in a well-architected codebase. What was the situation? Was there something in this codebase that was especially suited to AI-development? Large amounts of duplication perhaps?

shepherdjerred 7 hours ago | parent [-]

It's not particularly interesting.

I wanted to add audit logging for all endpoints we call, all places we call the DB, etc. across areas I haven't touched before. It would have taken me a while to track down all of the touchpoints.

Granted, I am not 100% certain that Claude didn't miss anything. I feel fairly confident that it is correct given that I had it research upfront, had multiple agents review, and it made the correct changes in the areas that I knew.

Also I'm realizing I didn't mention it included an API + UI for viewing events w/ pretty deltas

therealdrag0 2 hours ago | parent | prev | next [-]

Audit logging is different than developer logging… companies will have entire teams dedicated to audit systems.

fendy3002 3 hours ago | parent | prev | next [-]

Well someone who says logging is easy never knows the difficulty of deciding "what" to log. And audit log is different beast altogether than normal logging

fragmede 7 hours ago | parent | prev [-]

We're not as good at coding as you, naturally.