I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

▲ MeetingsBrowser 3 days ago | parent | next [-]

As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.

I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.

AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.

We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.

What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?

▲

satvikpendem 3 days ago | parent | next [-]

> I do systems programming.

I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.

▲

oytis 3 days ago | parent | next [-]

I think antirez mentioned somewhere he considered it particularly good at systems programming.

	▲	satvikpendem a day ago \| parent [-]
		Depends what it's used for, generally I've seen that due to the paucity of C or Rust etc training data vs Javascript and TypeScript, LLMs aren't as good at the former vs the latter.

▲

dboreham 2 days ago | parent | prev | next [-]

This is a myth in my experience. LLMs are good at all the kinds of programming I've tried using them on, including many cases that are very far from "CRUD web development".

▲

Traubenfuchs 3 days ago | parent | prev [-]

>95% of software development is crud.

	▲	id 3 days ago \| parent [-]
		It's really not, though. As soon as systems have to scale, regulatory requirements come in, etc. it becomes more complex. AI has solved simple CRUD, yes, but CRUD, was easy before.

▲

kakacik 3 days ago | parent | prev | next [-]

Anytime you hear such wild claims, imagine a typical code sweat shop (not just crud apps but templated eshops/business pages etc), not a system that will evolve for another 10-20 years beyond initial implementation and is backend cornerstone of some part of some corporation. That is in the case its actually true, there is tons of PR happening here, plus another gigaton of uncritical fanboyism like with any strong topic.

Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.

I also have your experience, even 100x code delivery improvement would barely move the needle of project delivery in our place. Better, more automated integration and end-to-end functional tests which reflect real world usage/data flows would actually make much bigger difference, no reason to think llms couldn't deliver this in near future.

▲

stavros 3 days ago | parent | prev | next [-]

Not the OP, but it might be that AI isn't as good at systems programming as it is at other domains, or it might be that you're using it differently than I am. I don't know which one it is (maybe AI just isn't good at writing the language you work with).

For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.

▲

rustystump 3 days ago | parent [-]

In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.

The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.

	▲	stavros 3 days ago \| parent [-]
		It sounds like we have opposite experiences.

▲

skeptic_ai 3 days ago | parent | prev | next [-]

I never touched kubernetes and in 1 week I have a few nodes running and i understand a lot of it. Not perfect but not bad.

▲

oytis 3 days ago | parent | next [-]

I have recently learned Kubernetes without AI and one week is more than enough to understand most of it.

▲

newphone733 3 days ago | parent [-]

This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.

	▲	weakfish 3 days ago \| parent [-]
		…it definitely is true, I spun up a cluster at home to learn it for a new job and felt comfortable with the basics within a few days.

▲

thrawa8387336 3 days ago | parent | prev | next [-]

That was the usual experience pre AI

▲

3 days ago | parent | prev [-]

[deleted]

▲

3 days ago | parent | prev | next [-]

[deleted]

▲

logicchains 3 days ago | parent | prev | next [-]

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are, not just for coding?

	▲	MeetingsBrowser 3 days ago \| parent [-]
		The "AI implementation" step in my workflow includes separate agents dedicated to testing and reviewing changes. The self feedback loop catches a lot of errors and mistakes, but it rarely produces working code in one go. In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt. We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.

▲

logicchains 3 days ago | parent | prev | next [-]

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are?

▲

b0rtb0rt 2 days ago | parent | prev | next [-]

i work on cutting edge c++ system programming and we are using codex for everything now, it’s pretty impressive honestly what it can do

▲

adamtaylor_13 a day ago | parent | prev [-]

We design and build software systems that our clients' businesses run on. So it's not the product, it's the system that allows them to run their business. Typically, it's less "QuickBooks" and more "Let QuickBooks talk to 10 different systems" and then custom functionality built on that.

It's glue, custom business workflows, and basic web CRUD stuff. We build almost everything on Rails unless there's a critical reason not to (e.g., maintaining an existing system versus building from scratch.)

With very few exceptions our team composition is one senior engineer paired to a business. So we get to avoid a large amount of SDLC busywork which is inter-team communication. This leaves more time for client<->engineer communication which has a host of additional benefits. We also build with a "North Star" methodology which keeps everyone, including the client, laser focused on the work at hand.

To answer your final question about how we're benefiting so much from AI, I think it's primarily that we're leaning into it for both implementation, testing, and review. I know it's a sin to let AI review AI, but... it works. I'm actively skeptical of it myself, but our error rate and rework rates don't lie.

And we've got clients in various stages of development and/or long-term support. It's not like we're just hammering a bunch of stuff out and then bouncing. Most of these are multi-year tightly-integrated projects with our clients and we don't see a lack of trust or frustration that you'd expect to see if you were shipping slop. Our Honeybadger errors typically stay at zero, our performance metrics are acceptable across the board, and most importantly our clients love the work we're doing.

I can't think of any other way to measure the quality of what we're doing. And by those metrics, AI has made us better, not worse.

I should write a blog post to outline more of this in detail.

▲ pron 3 days ago | parent | prev | next [-]

The only way you could possibly know that is if you're reviewing the code, which means you're not "managing fleets of agents". If you're not reviewing the code (and you wouldn't be if you're managing fleets of agents), then you have no way to tell what you're shipping.

	▲	strogonoff 3 days ago \| parent \| next [-]
		It’s under-appreciated that a proper review takes at least as long as the actual work: it’s all the same time spent understanding the challenge and coming up with the best solution, minus the time spent typing in your solution (almost never a significant amount), plus the time spent understanding their solution and explaining how to get from theirs to yours.
	▲	adamtaylor_13 a day ago \| parent \| prev [-]
		Correct. We do review the code, and we're not managing "fleets of agents". My experience has generally been that the "fleet" approach is not very effective.

▲ maccard 3 days ago | parent | prev | next [-]

Can you link to a changelog that shows the 5-10x feature increases? I keep hearing this, but I don’t see anything I use ever actually shipping like this, or people backing this up with any sort of proof.

	▲	adamtaylor_13 a day ago \| parent \| next [-]
		Our projects are closed source due to our clients owning the code, but I can offer anecdote. We have a client whose business operates on 2-3 very niche SaaS applications in the veterinary/animal medicine space. In a span of about 6 months, we completely ripped out 2 of those 3 and are working on replacing the 3rd one right now. We've done this with a single senior engineer working with the client between 20-40 hours per week with no major regressions. The business has been able to continue working as usual with no disruptions throughout this process. Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.
	▲	toraway 2 days ago \| parent \| prev [-]
		That reminds me of a chart I saw posted in HN comments recently that someone created tracking bullet points in Claude Code release notes per day that was cited as "proof of a step change" in AI development over the last year. It showed like a dozen or so on average that jumped to to like over 50 one month and stayed around that number. (Not the exact same chart but similar idea, I guess it's sort of a meme: https://imgur.com/a/YrNGYOR) So I looked at the most recent CC release notes on Github and the majority look like this: Fixed /clear not resetting the terminal tab title after a conversation Fixed session title chip from /rename disappearing while a permission or other dialog is active Fixed agent panel below the prompt being hidden when subagents are running (regression in 2.1.122) Fixed external-editor handoff (Ctrl+G) blanking the conversation history above the prompt Fixed /context dumping its rendered ASCII visualization grid into the conversation, wasting ~1.6k tokens per call Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model I'd be extremely interested to know what percentage of these were just fixing last week's Claude Code written PR that no human ever set eyes on. But hey, all that churn looks great on charts being circulated on social media as free advertising for their flagship product (and consequently the company's valuation) so never mind, LGTM!

▲ aprilthird2021 3 days ago | parent | prev | next [-]

Give an example.

I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.

What's your example?

	▲	adamtaylor_13 a day ago \| parent \| next [-]
		From another comment above: > Our projects are closed source due to our clients owning the code, but I can offer anecdote. We have a client whose business operates on 2-3 very niche SaaS applications in the veterinary/animal medicine space. In a span of about 6 months, we completely ripped out 2 of those 3 and are working on replacing the 3rd one right now. We've done this with a single senior engineer working with the client between 20-40 hours per week with no major regressions. The business has been able to continue working as usual with no disruptions throughout this process. > Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.
	▲	pron 3 days ago \| parent \| prev [-]
		If you carefully review the code then you're not doing what Armstrong was talking about. If you're not reviewing the code, then you don't really know what it is that the AI built. Of course it passes tests; that's not the problem. The problem is that the code is complicated and obtuse, even if it doesn't seem that way on the surface, and after some rounds of evolution, the agents are no longer able to evolve or maintain the code. The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.

▲ globular-toast 3 days ago | parent | prev | next [-]

Does what you ship involve hundreds of lines of HTML/CSS by any chance? Do you care about accessibility?

	▲	adamtaylor_13 a day ago \| parent [-]
		It does indeed. Most of what we build are web applications used internally by our clients (e.g., inside their business, not customer facing.) Because of that, we don't typically spend a lot of time on accessibility because it's internal facing software. As far as I'm aware, these businesses don't have individuals who need those accommodations. Of course, if that changed, it is something we'd need to consider.

▲ grayhatter 2 days ago | parent | prev | next [-]

> I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

> We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

I've been searching for months for evidence of this kinda thing. Do you have receipts you can share? Or is it more of the same "just trust me bro"?

	▲	adamtaylor_13 a day ago \| parent [-]
		I should put together a blog post to share more, but unfortunately it is more "trust me bro" at this stage. You can see a few other comments where I replied: we do have subjective evidence that seems to suggest to me that we're moving much faster than we could've moved in the past. Of course, it's not just shipping, it's shipping stably in a way that doesn't disrupt the day-to-day operations of the businesses we're working for. One client that comes to mind has 2-3 niche SaaS applications that they used independently for various workloads. We completely replaced 2 of those without any disruptions to their business in about 6 months (no, we did not replace it feature-for-feature; we just built what they needed.)

▲ willio58 3 days ago | parent | prev | next [-]

What you are shipping is not the same as what Coinbase is shipping. These are vastly different things. Making a shiny app with AI is great, I'm doing it as I type this. But I am under no delusion that what I make can sustain a multi-million dollar or even billion dollar business in the case of Coinbase. That's plain silly.

	▲	adamtaylor_13 a day ago \| parent [-]
		I agree with you. I didn't intend to make the argument that what my company does and what Coinbase does are on the same level, if that's what came across.

▲ mdavid626 3 days ago | parent | prev [-]

Shipping garbage.

	▲	adamtaylor_13 2 days ago \| parent [-]
		We have zero Honeybadger errors, performance is acceptable for all our routes in the application, and all of our key stakeholders are ecstatic about what we've built. Is there some other metric I should be measuring our code by?