People who are saying they're not seeing productivity boost, can you please share where is it failing?

Because, I am terrified by the output I am getting while working on huge legacy codebases, it works. I described one of my workflow changes here: https://news.ycombinator.com/item?id=47271168 but in general compared to old way of working I am saving half of the steps consistently, whether its researching the codebase, or integrating new things, or even making fixes. I have stopped writing code, occasionally I jump into the changes proposed by LLM and make manual edits if it is feasible, otherwise I revert changes and ask it to generate again but based on my learnings from the past rejected output

I am terrified about what's coming

▲ yoyohello13 18 hours ago | parent | next [-]

The companies laying off people have no vision. My company is a successful not for profit and we are hiring like crazy. It’s not a software company, but we have always effectively unlimited work. Why would anyone downsize because work is getting done faster? Just do more work, get more done, get better than the competition, get better at delivering your vision. We put profits back in the community and actually make life better for people. What a crazy fucking concept right?

▲

tkgally 17 hours ago | parent | next [-]

I suspect it depends partly on how locked each individual is into a particular type of work, both skill-wise and temperamentally.

To give an example from a field where LLMs started causing employment worries earlier than software development: translation. Some translators made their living doing the equivalent of routine, repetitive coding tasks: translating patents, manuals, text strings for localized software, etc. Some of that work was already threatened by pre-LLM machine translation, despite its poor quality; context-aware LLMs have pretty much taken over the rest. Translators who were specialized in that type of work and too old or inflexible to move into other areas were hurt badly.

The potential demand for translation between languages has always been immense, and until the past few years only a tiny portion of that demand was being met. Now that translation is practically free, much more of that demand is being met, though not always well. Few people using an app or browser extension to translate between languages have much sense of what makes a good translation or of how translation can go bad. Professional translators who are able to apply their higher-level knowledge and language skills to facilitate intercultural communication in various ways can still make good money. But it requires a mindset change that can be difficult.

▲

adelie 15 hours ago | parent [-]

I'm not in translation, but a number of close friends are in the industry. Two trends I've noticed in the industry, which I think we're seeing mirrored in tech:

1. No one cares about quality. Even in fields you'd expect to require the 'human touch' (e.g. novel translation), publishers are replacing translators with AI. It doesn't matter if you have higher-level knowledge or skills if the company gains more from cutting your contract than it loses in sales.

2. Translation jobs have been replaced with jobs proofreading machine translations, which pays peanuts (since AI is 'doing most of the work') but in fact takes almost as much effort as translating from scratch (since AI is often wrong in very subtle ways). The comparison to PR reviews makes itself.

▲

thbb123 13 hours ago | parent | next [-]

It is not entirely true that no one cares about quality. I'd like to stay optimistic and believe that those who are demanding on the quality of their production will acquire sufficient market differentiation to prevail.

After all, this has been Apple strategy since the 80's, and, even though there were some up's and down's, overall it's a success.

	▲	palmotea 10 hours ago \| parent \| next [-]
		> It is not entirely true that no one cares about quality. I'd like to stay optimistic and believe that those who are demanding on the quality of their production will acquire sufficient market differentiation to prevail. Maybe, but it probably requires a very strong and opinionated leader to pull off. The conventional wisdom in American business leadership seems to be to pursue the lowest level of quality you can get away with, and focus on cutting costs. And you'll have to fight that every second. I don't think that's true at the individual-contributor level (pursing quality is very motivating), but they people who move up are the ones who sound "smart" by aping conventional wisdom. > After all, this has been Apple strategy since the 80's, and, even though there were some up's and down's, overall it's a success. I might give you that "since the late 90s," but there have been significant periods where that wasn't true (e.g. the early mid-90s Mac OS was buggy and had poor foundations).
	▲	kavalg 10 hours ago \| parent \| prev [-]
		someone still will, but quality will become really expensive

▲

izacus 2 hours ago | parent | prev [-]

In other words, AI was used to massively depress wages and lower quality of life of employees while outputting worse results. Which is what is now happening in software.

▲

afro88 18 hours ago | parent | prev | next [-]

This is exactly right IMO. I have never worked for a company where the bottleneck was "we've run out of things to do". That said, plenty of companies run out of actual software engineering work when their product isn't competitive. But it usually isn't competitive because they haven't been able to move fast enough

▲

weatherlite 12 hours ago | parent | next [-]

I think it depends on:

A) how old the product is: Twitter during its first 5 years probaby had more work to do compared to Twitter after 15 years. I suspect that is why they were able to get rid of so many developers.

B) The industry: many b2c / ecommerce businesses are straightforward and don't have an endless need for new features. This is different than more deep tech companies

	▲	thewebguyd 9 hours ago \| parent [-]
		There’s a third one, and it’s non-tech companies or companies for whom software is not a core product. They only make in-house tooling, ERP extensions, etc. Similar to your Twitter example, once the ERP or whatever is “done” there’s not much more work to do outside of updating for tax & legal changes, or if the business launches new products, opens a new location, etc. I’ve built several of such tools where I work. We don’t even have a dev team, it’s just IT Ops, and all of what I’ve built is effectively “done” software unless the business changes. I suspect there’s a lot of that out there in the world.

▲

sdf2df an hour ago | parent | prev [-]

Not moving fast enough.. sure. But to what direction? The direction and clarity of it is the hardest part.

▲

ehnto 18 hours ago | parent | prev | next [-]

That was my insight also. As a manager, you already have the headcount approved, and your people just allegedly got some significant percentage more productive. The first thought shouldn't be, great let's cut costs, it should be great now we finally have the bandwidth to deliver faster.

On a macro level, if you were in a rising economic tide, you would still be hiring, and turning those productivity gains into more business.

I wonder what the parallels are to past automations. When part producing companies moved from manual mills to CNC mills, did they fire a bunch of people or did they make more parts?

▲

superfrank 16 hours ago | parent | next [-]

I'm an EM as well and I've been telling my teams for a while now that I think they really only need to start worrying once our backlog starts going down instead of up. Generally, I still agree with that (and your) sentiment when you look at the long term, but in the short term, I think all of the following arguments can be made in favor of layoffs:

- AI tools are expensive so until the increased productivity translates to increased revenue we need to make room in the budget

- We expect the bottlenecks in our org to move from writing code to something else (PM or design or something) so we're cutting SWEs in anticipation of needing to move that budget elsewhere.

- We anticipate the skillsets needed by developers in the AI world to be fundamentally different from what they are now that it's cheaper to just lay people off, run as lean as possible, and rehire people with the skills we want in a year or two than it is to try and retrain.

I don't necessarily agree with those arguments (especially the last one), but I think they're somewhat valid arguments

	▲	throwaw12 16 hours ago \| parent [-]
		I see similar arguments and I don't agree as well, here is why: > rehire people with the skills we want in a year or two than it is to try and retrain. before that future comes your company might become obsolete already, because you have lost your market share to new entrants > We expect the bottlenecks in our org to move from writing code to something else I would love to tell them, hey lets leverage current momentum and build, when those times come, we offer existing people with accumulated knowledge to retrain to a new type of work, if they think they're not good fit, they can leave, if they're willing, give them a chance, invest in people, make them feel safe and earn trust and loyalty from them > AI tools are expensive so until the increased productivity translates to increased revenue we need to make room in the budget 1. Its not that expensive: 150$/seat/month -> 5 lunches? or maybe squeeze it from Sales personnel traveling with Business class? 2. By the time increased productivity is realized by others, company who resisted could be so far behind, that they won't be able to afford hiring engineers with those skillsets, if they think 150$ is expensive now, I am sure they will say "What??? 350k$ for this engineer?, no way, I will instead hire contractors"

▲

NathanielK 13 hours ago | parent | prev | next [-]

CNC machines drove down operator wages. Its similar to the translator example where the machine code is written by someone else, but the person running the machine still needs to understand. Simple pushing the go button is dangerous, being able to adapt is critical.

Jobs where a machinist is in charge of large chunks of the process are rarer. Large shop will have one person setting up many machines to maximize throughput.

▲

anthonypasq 8 hours ago | parent | prev [-]

business success does not scale at the speed of increased profits from layoffs.

▲

anthonypasq 8 hours ago | parent | prev | next [-]

most businesses dont actually have an infinite amount of work that has extremely high ROI. every new project at google for example has to justify the engineering spend of developing a product that has comparable margin to the ad business. Why spend 10 million a year of engineering resources on a new product that might 1. completely fail or 2. be a decent product with 20% margins when they could do nothing and keep raking in 90% margins from the ads business.

▲

throw3847r7 16 hours ago | parent | prev | next [-]

You need certain company culture, to be able to scale up, and to capture this value. Most companies can not just add new developers.

AI needs documentation, automation, integration tests... It works very well for remote first company, but not for in-face informal grinding approach.

Just year ago, client told me to delete integration tests, because "they ran too long"!

▲

joe_mamba 15 hours ago | parent [-]

>Just year ago, client told me to delete integration tests, because "they ran too long"!

Why are you surprised customers don't like spending money on the items that don't add business value. Add to that QA, documentation, security audits, etc.

They want to ship stuff that brings in customers and revenue day one, everything else is a cost.

▲

SideburnsOfDoom 11 hours ago | parent [-]

> integration tests, QA etc ... the items that don't add business value

They absolutely do add value / prevent loss, but you need some understanding in order to see that. Not seeing it is a marker of not understanding.

	▲	joe_mamba 11 hours ago \| parent [-]
		>They absolutely do add value Not to the non-technical bean counters. When they allocate money they want to see you prove how that extra money translates to an immediate ROI, and it's difficult to prove that in an Excel sheet exactly what the ROI will be without making stuff up on vibes and feels. Like at one German company i was at ~15 years ago, all the devs wanted a second 19" monitor on our workstations for increased productivity, and the bean counters wouldn't approve that because they wanted proof of how that expense across hundreds of people will increase our productivity and by how much %, to see if that would offset the cost. This is how these people think. If you don't bring hard numbers on how much their "line will go up", they won't give you money. I know this is difficult to understand from the PoV of SV Americans where gazillions of dollars just fall from the sky at their tech companies.

▲

RA_Fisher 15 hours ago | parent | prev | next [-]

Does that extra work bring in more revenue? I think that’s the key question.

▲

raphaelj 14 hours ago | parent [-]

Companies that do not reduce their workforce might outcompete you.

It might not be about bringing more revenues but retaining market share.

	▲	Esophagus4 11 hours ago \| parent [-]
		If your barrier to being competitive is a slow, bureaucratic org, restructuring and laying off might actually help long term.

▲

crocowhile 17 hours ago | parent | prev | next [-]

Because hiring less while getting more done increases margins. Your company is not for profit so doesnt care about margins. Others do.

▲

threatofrain 18 hours ago | parent | prev | next [-]

These are words without weights. At some point the put money into software option will max out. Perhaps what we should all be doing is hiring more lawyers, there's always more legal work to be done. When you don't have weights then you can reason like this.

▲

yoyohello13 18 hours ago | parent [-]

I don’t know what kind of software your used to but software is pretty much universally dog shit these days. I could probably count on one hand the number of programs that I actually like using. There is an astronomical room for improvement. I don’t think we are hitting diminishing returns any time soon.

	▲	laurentiurad 17 hours ago \| parent [-]
		I talk about this at length in one of my previous posts here: https://news.ycombinator.com/item?id=39963058. I definitely share your opinion and I think this will be exacerbated by vibe coding and having LoC as the main KPI for engineering teams.

▲

arwhatever 17 hours ago | parent | prev | next [-]

I’ve been screaming this too https://news.ycombinator.com/item?id=47212237

It’s refreshing to see the same sentiment from so many other people independently here.

▲

zipy124 15 hours ago | parent | prev | next [-]

The problem becomes if you are a service like Youtube, where you already have capture almost the entire customer base.

▲

svara 15 hours ago | parent | prev | next [-]

Yes, it's the lump of labor fallacy.

Doesn't exclude the possibility of short term distribution, though.

▲

throwaw12 18 hours ago | parent | prev | next [-]

> Just do more work, get more done

That's one of the reasons why I am terrified, because it can lead to burn out, and I personally don't like to babysit bunch of agents, because the output doesn't feel "mine", when its not "mine" I don't feel ownership.

And I am deliberately hitting the brake from time to time not to increase expectations, because I feel like driving someone else's car while not understanding fully how they tuned their car (even though I did those tunings by prompting)

▲

ako 16 hours ago | parent | next [-]

I'm currently a product manager (was a software engineer and technical architect before), so i already lost the feeling of ownership of code. But just like when you're doing product management with a team of software engineers, testers, and UXers, with AI you can still feel ownership of the feature or capability you're shipping. So from my perspective, nothing changes regarding ownership.

▲

discreteevent 15 hours ago | parent [-]

> So from my perspective, nothing changes regarding ownership.

The engineer who worked with you took ownership of the code! Have you forgotten this?

	▲	ako 15 hours ago \| parent [-]
		No, that’s why I wrote “from my perspective”. I started long ago writing 6502 and 68000 assembly, later c and even later Java. Every step you lose ownership of the underlying layer. This is just another step. “But it’s non deterministic!”, yes so are developers. We need QA regardless who or what write the lines of code.

▲

QuercusMax 18 hours ago | parent | prev [-]

It feels very much like leading a team of junior engineers or even interns who are very fast but have no idea about why we're doing anything. You have to understand the problems you're trying to solve and describe the solutions in a way they can be implemented.

It's not going to be written exactly like you would do it, but that's ok - because you care about the results of the solution and not its precise implementation. At some point you have to make an engineering decision whether to write it yourself for critical bits or allow the agent/junior to get a good enough result.

You're reviewing the code and hand editing anyway, right? You understand the specs even if your agent/junior doesn't, so you can take credit even if you didn't physically write the code. It's the same thing.

▲

throwaw12 17 hours ago | parent [-]

> It feels very much like leading a team of junior engineers or even interns who are very fast but have no idea about why we're doing anything

Yes, yes!

And this is problem for me, because of the pace, my brain muscles are not developing enough compared to when I was doing those things myself.

before, I was changing my mind while implementing the code, because I see more things while typing, and digging deeper, but now, because juniors are doing things they don't offer me a refactoring or improvements while typing the code quickly, because they obey my command instead of having "aha" moment to suggest better ways

	▲	layer8 13 hours ago \| parent [-]
		There’s some hope that the industry will realize that managing clueless LLMs at high pace isn’t sustainable and leads to worse results, and some middle ground has to be found. Or we will reach AGI, so AI won’t be clueless anymore and really take your engineering job.

▲

MattGaiser 17 hours ago | parent | prev | next [-]

You would need to expand your capacity to find and define the work. I imagine that would be a major challenge.

▲

apercu 13 hours ago | parent | prev | next [-]

I think a lot of companies have ineffective ways to measure productivity, poor management (e.g., people who were IC's then promoted to management but have no management training or experience), incentives aren't necessarily aligned between orgs and staff, so people end up with a perverse "more headcount" means I'm better than Sandy over there. Leadership and vision have been rare in my professional life (though the corporate-owned media celebrates mediocrity in leadership all the time with puff pieces).

Once you get to a certain size company, this means a lot of bloat. Heck, I've seen small(ish) companies that had as many managers and administrators as ICs.

But You're not wrong, I'm just pointing out how an org that has 4k people can lay off a few hundred with modest impact of the financials (though extensive impact on morale).

▲

sdf2df an hour ago | parent | prev [-]

Stop talking sense bro, you'll get downvoted.

If you look at my post history I'm essentially saying the same stuff lol.

▲ rich_sasha 12 hours ago | parent | prev | next [-]

I find LLMs are good at essentially boilerplate code. It's clear what to do and it needs to be typed in. Or areas where I really have no idea where to start, because I'm not familiar with the codebase.

I find anything else, I spend more time coaxing them into doing 85% of what I need that I'm better off doing it myself.

So they're not useless but there's only so many times in a week that I need a function to pretty-print a table in some fashion. And the code they write on anything more complex than a snippet is usually written poorly enough that it's a write-once-never-touch-again situation. If the code needs to be solid, maintainable, testable, correct (and these are kind of minimal requirements in my book) then LLMs make little impact on my productivity.

They're still an improvement on Google and Stack exchange, but again - only gets you so far.

YMMV

▲

vividfrier 11 hours ago | parent | next [-]

> I find anything else, I spend more time coaxing them into doing 85% of what I need that I'm better off doing it myself.

You must be working in a very niche field with very niche functionality if that's the case? I work at a company just outside of FAANG and I work in compliance. Not a terribly complex domain but very complicated scale and data integrity requirements.

I haven't written a single line of code manually in 2 weeks. Opus 4.6 just... works. Even if I don't give it all the context it just seems to figure things out. Occasionally it'll make an architectural error because it doesn't quite understand how the microservices interact. But these are non-trivial errors (i.e. humans could have made them as well) and when we identify such an error, we update the team-shared CLAUDE.md to make sure future agents don't repeat the error.

▲

rich_sasha 11 hours ago | parent | next [-]

I often wonder what I am missing. Recently I wanted to wrap a low level vendor API with a callback API (make a request struct and request id, submit, provide a callback fn, which gets called with request IDs and messages received from vendor) to async Python (await make_request(...)). Kinda straightforward - lots of careful code of registering and unregistering callbacks, some careful thread synchronisation (callbacks get called in another thread), thinking about sane exception handling in async code. Fiddly but not rocket science.

What I got sort of works, as in tests pass - this with Opus 4.5. It is usable, though it doesn't exist cleanly on errors despite working to death with Claude about this. On exception it exits dirtily and crashes, which is good enough for now. I had some fancy ideas about logging messages from the vendor to be able to replay them, to be able to then reproduce errors. Opus made a real hash of it, lots of "fuck it comment out the assert so the test passes". This part is unusable and worse, pollutes the working part of the project. It made a valiant effort at mocking the vendor API for testing but really badly, instead of writing 30 lines of general code, it wrote 200 lines of inconsistent special cases that don't even work altogether. Asked to fix it it just shuffles around the special cases and gets stuck.

It's written messily enough that I wouldn't touch this even to remove the dead code paths. I could block a few days for it to fix but frankly in that time I can redo it all and better. So while it works I'm not gonna touch it.

I did everything LLM proponents say. I discussed requirements. Agent had access to the API docs and vendor samples. I said think hard many times. Based on this we wrote a detailed spec, then detailed inplementation plan. I hand checked a lot of the high level definitions. And yet here I am. By the time Opus went away and started coding, we had the user facing API hammered out, key implementation details (callback -> queue -> async task in source thread routing messages etc), constraints (clean handling of exceptions, threadsafe etc). Tests it has to write. Any minor detail we didn't discuss to death was coded up like a bored junior.

And this also wasn't my first attempt, this was attempt #3. First attempt was like, here's the docs and samples, make me a Python async API. That was a disaster. Second was more like, let's discuss, make a spec, then off you go. No good. Even just taking the last attempt time, I would have spent less time doing this by hand myself from scratch.

▲

Bewelge 8 hours ago | parent | next [-]

Just a guess, but to me it sounds like you're trying to do too much at once. When trying something like this:

> lots of careful code of registering and unregistering callbacks, some careful thread synchronisation (callbacks get called in another thread), thinking about sane exception handling in async code. Fiddly but not rocket science.

I'd expect CC to fail this when just given requirements. The way I use it is to explicitly tell it things like: "Make sure to do Y when callback X gets fired" and not "you have to be careful about thread synchronisation". "Do X, so that Exceptions are always thrown when Y happens" instead of "Make sure to implement sane Exception handling". I think you have to get a feeling for how explicit you have to get because it definitely can figure out some complexity by itself.

But honestly it's also requires a different way of thinking and working. It reminds me of my dad reminiscing that the skill of dictating isn't used at all anymore nowadays. Since computers, typing, or more specifically correcting what has been typed has become cheap. And the skill of being able to formulate a sentence "on the first try" is less valuable. I see some (inverse) parallel to working with AI vs writing the code yourself. When coding yourself you don't have to explicitly formulate everything you are doing. Even if you are writing code with great documentation, there's no way that it could contain all of the tacit knowledge you as the author have. At least that's how I feel working with it. I just got really started with Claude Code 2 months ago and for a greenfield project I am amazed how much I could get done. For existing, sometimes messy side projects it works a lot worse. But that's also because it's more difficult to describe explicitly what you want.

▲

rich_sasha 7 hours ago | parent [-]

> The way I use it is to explicitly tell it things like: "Make sure to do Y when callback X gets fired" and not "you have to be careful about thread synchronisation". "Do X, so that Exceptions are always thrown when Y happens" instead of "Make sure to implement sane Exception handling".

At this point I'm basically programming in English, no? Trying to squeeze exact instructions into an inherently ambiguous representation. I might as well write code at this point, if this is the level of detail required. For this to work, I need to be able to say "make this thread-safe", maybe "by using a queue". Not explaining which synchronisation primitive to use in every last piece of the code.

This is my point actually. If I describe the task to accuracy level X, it still doesn't seem to work. To make it work, perhaps I need to describe it to level Y>X, but that for now takes me more time than to do it myself.

There's lots of variables here, how fast I am at writing code or planning structure, how close to spec the things needs to be, etc. My first "vibe code" was a personal productivity app in Claude Code, in Flutter (task timing). I have 0 idea about Dash or Flutter or any web stuff, and yet it made a complete app that did some stuff, worked on my phone, with a nice GUI, all from just a spec. From scratch, it would take me weeks.

...though in the end, even after 3 attempts, the final thing still didn't actually work well enough to be useful. The timer would sometimes get stuck or crash back down to 0, and froze when the app was minimised.

	▲	Bewelge 6 hours ago \| parent [-]
		> At this point I'm basically programming in English, no? Yea, except they can handle some degree of complexity. Its usefulness obviously really depends on that degree. And I'm sure there are still a lot of domains and types of software where that tradeoff between doing it yourself or spelling it out isn't worth it.

▲

Izkata 8 hours ago | parent | prev | next [-]

Based on what I've seen and heard, you have the happy path working and that's what the pro-AI people are describing with huge speedups. Figuring out and fixing the edge cases and failure modes is getting pushed into the review stage or even users, so it doesn't count towards the development time. It can even count as more speed if it generates more cases that get handled quickly.

	▲	rich_sasha 8 hours ago \| parent [-]
		I'm not sure I agree with this approach, or at least it doesn't work in my area. It's like self driving cars. Having 90% reliability is almost as good as 0%. I have to be confident the thing is gonna work, correctly, or at worst fail predictably. I can see that there's a lot of applications where things can just randomly fail and you retry / restart, that helps with crashes. But the AI can't make it not crash, what's to say it does the right thing when it succeeds? Again, depends on the relative cost of errors etc.

▲

stiiv 10 hours ago | parent | prev [-]

> On exception it exits dirtily and crashes, which is good enough for now

Silent failures and unexplained crashes are high on my list of things to avoid, but many teams just take them for granted in spite of the practical impact.

I think that a lot of orgs have a culture of "ship it and move on," accompanied by expectations like: QA will catch it, high turnover/lower-skill programmers commit stuff like this all the time anyway, or production code is expected to have some rough edges. I've been on teams like that, mostly in bigger orgs with high turnover and/or low engineering standards.

▲

mbbutler 8 hours ago | parent | prev [-]

Two use-cases recently where Claude sucked for me:

1. Performance-critical code to featurize byte slices for use in a ML model. Claude kept trying to take multiple passes over the slice when the featurization can obviously be done in one. After I finally got it to do the featurization in one pass it was double-counting some bytes but not others (double counting all of them would have been fine since the feature vector gets normalized). Overall it was just very frustrating because this should have been straight-forward and instead it was dogshit.

2. Performance-critical code that iterates over lines of text and possibly applies transformations, similar to sed. Claude kept trying to allocate new Strings inside of the hot-loop for lines that were not transformed. When I told it to use Cow<'a, str> instead so that the untransformed lines, which make up the majority of processed lines, would not need a new allocation, Claude completely fucked up the named lifetimes. Importantly, my CLAUDE.md already tells Claude to use copy-on-write types to reduce allocations whenever possible. The agent just ignored it, which is _the_ issue with LLMs: they're non-deterministic and any guidance you provide is ultimately just a suggestion.

▲

joenot443 11 hours ago | parent | prev [-]

> I spend more time coaxing them into doing 85% of what I need that I'm better off doing it myself

What was the last thing you built in which you felt this was the case?

▲ mirsadm 13 hours ago | parent | prev | next [-]

I have an app which is fairly popular. This release cycle I used Claude Code and codex to implement all the changes / features. It definitely let me move much quicker than before.

However now that it's in the beta stage the amount of issues and bugs is insane. I reviewed a lot of the code that went in as well. I suspect the bug fixing stage is going to take longer than the initial implementation. There are so many issues and my mental model of the codebase has severely degraded.

It was an interesting experiment but I don't think I would do it again this way.

▲

aurareturn 11 hours ago | parent | next [-]

The last 10% takes up 90% of the time. Usually, the time is spent finding issues you didn't even know about. This was true before LLMs.

▲

johsole 6 hours ago | parent | prev | next [-]

I make mistakes when writing code, but I know what types of mistakes I make. With AI it's like a coworker who makes mistakes, sometimes they're obvious to me and sometimes they're not, because I make different mistakes.

▲

truetraveller 13 hours ago | parent | prev | next [-]

Thanks for the insight!

▲

sdf2df an hour ago | parent | prev | next [-]

"There are so many issues and my mental model of the codebase has severely degraded."

Not only that, the less coding you do in general? Guess what, fixing issues that in the past wouldve been a doddle (muscle memory) become less harder due to atrophy.

Swear most people dont think straight and cant see the obvious.

▲

sdf2df an hour ago | parent | prev | next [-]

Congrats. Now post this more often so the bozo's who downvote posts that push-against pro-LLM stuff f-off.

I came to the same conclusion when producing a video with Grok. Did the job but utterly painful and it was definitely very costly - I used 50 free-trial accounts and maxed them out each day for a month.

Im pretty sure these conclusions hold across all models and therefore the technology by extension.

▲

maplethorpe 13 hours ago | parent | prev [-]

Rather than trying to fix the bugs yourself, have you tried asking Claude to fix them for you?

	▲	mirsadm 13 hours ago \| parent [-]
		I have already been doing this. I could keep doing it but I'm not going to. I want to be able to understand my own code because that is what let's me make sound higher level decisions.

▲ jpollock 16 hours ago | parent | prev | next [-]

The last time I tried AI, I tested it with a stopwatch.

The group used feature flags...

    if (a) {
       // new code
    } else {
       // old code
    }

    void testOff() {
       disableFlag(a);
       // test it still works
    }
    
    void testOn() {
        enableFlag(a);
        // test it still works
    }

However, as with any cleanup, it doesn't happen. We have thousands of these things lying around taking up space. I thought "I can give this to the AI, it won't get bored or complain."

I can do one flag in ~3minutes. Code edit, pr prepped and sent.

The AI can do one in 10mins, but I couldn't look away. It kept trying to use find/grep to search through a huge repo to find symbols (instead of the MCP service).

Then it ignored instructions and didn't clean up one or the other test, left unused fields or parameters and generally made a mess.

Finally, I needed to review and fix the results, taking another 3-5 minutes, with no guarantee that it compiled.

At that point, a task that takes me 3 minutes has taken me 15.

Sure, it made code changes, and felt "cool", but it cost the company 5x the cost of not using the AI (before considering the token cost).

Even worse, the CI/CD system couldn't keep up the my individual velocity of cleaning these up, using an automated tool? Yeah, not going to be pleasant.

However, I need to try again, everyone's saying there was a step change in December.

▲

laserlight 15 hours ago | parent | next [-]

I did my own experiment with Claude Code vs Cursor tab completion. The task was to convert an Excel file to a structured format. Nothing fancy at all.

Claude Code took 4 hours, with multiple prompts. At the end, it started to break the previous fixes in favor of new features. The code was spaghetti. There was no way I could fix it myself or steer Claude Code into fixing it the right way. Either it was a dead-end or a dice roll with every prompt.

Then I implemented my own version with Cursor tab completion. It took the same amount of time, 4 hours. The code had a clear object-oriented architecture, with a structure for evolution. Adding a new feature didn't require any prompts at all.

As a result, Claude Code was worse in terms of productivity: the same amount of time, worse quality output, no possibility of (or at best very high cost of) code evolution.

▲

thesamethrowawa 15 hours ago | parent | next [-]

Are you able to share your prompts to Claude Code? I assume not, they are probably not saved - but this genuinely surprised me, it seems like exactly the type of task an LLM would excel at (no pun intended!). What model were you using OOI?

	▲	laserlight 14 hours ago \| parent [-]
		> this genuinely surprised me Me too. After listening to all the claims about Claude Code's productivity benefits, I was surprised to get the result I got. I'm not able to share details of my work. I was using Claude Opus 4.5, if I recall correctly.

▲

shinycode 15 hours ago | parent | prev [-]

The exact same prompt ? Everything depends on the prompt and it’s different tools. These days the quality and what’s build around the prompt matters as much as the code. We can’t feed generic query.

▲

sensanaty 12 hours ago | parent | prev | next [-]

Similar happened to me just now. Claude whatever-is-the-latest-and-greatest, in Claude Code. I also tried out Windsurf's Arena Mode, with the same failure. To intercept the inevitable "holding it wrong" comments, we have all the AGENTS.md and RULES.md files and all the other snake oil you're told to include in the project. It has full context of the code, and even the ticket. It has very clear instructions on what to do (the kind of instructions I would trust an unpaid intern with, yet alone a tool marketed as the next coming of Cyber Jesus that we're paying for), in a chat with minimal context used up already. I manually review every command it runs, because I don't trust it running shell scripts unsupervised.

I wanted it to finish up some tests that I had already prefilled, basically all the AI had to do was convert my comments into the final assertions. A few minutes later of looping, I see it finishes and all tests are green.

A third of the tests were still unfilled, I guess left as an exercise for the reader. Another third was modified beyond what I told it to do, including hardcoding some things which made the test quite literally useless and the last third was fine, but because of all the miscellaneous changes it made I had to double check those anyways. This is about the bare minimum where I would expect these things to do good work, a simple take comment -> spit out the `assert()` block.

I ended up wasting more time arguing with it than if I had just done the menial task of filling out the tests myself. It sure did generate a shit ton of code though, and ran in an impressive looking loop for 5-10 minutes! And sure, the majority of the test cases were either not implemented or hardcoded so that they wouldn't actually catch a breakage, but it was all green!!

That's ultimately where this hype is leading us. It's a genuinely useful tool in some circumstances, but we've collectively lost the plot because untold billions have poured into these systems and we now have clueless managers and executives seeing "tests green -> code good" and making decisions based on that.

▲

embedding-shape 14 hours ago | parent | prev | next [-]

What model, what harness and about how long was your prompt to fire off this piece of work? All three matters a lot, but importantly missing from your experience.

▲

sdf2df 40 minutes ago | parent | prev [-]

[dead]

▲ wasmainiac 18 hours ago | parent | prev | next [-]

Because its failure rate is too high. Beyond boilerplate code and CRUD apps, if I let AI run freely on the projects I maintain, I spend more time fixing its changes than if I just did it myself. It hallucinates functionally, it designs itself into corners, it does not follow my instructions, it writes too much code for simple features.

It’s fine at replacing what stack overflow did nearly a decade ago, but that isn’t really an improvement from my baseline.

▲

qudat 12 hours ago | parent | next [-]

It’s not that it just makes mistakes but it also implements things in ways I don’t like or are not relevant to the business requirements or scope of the feature / project.

I end up replacing any saved time with QA and code review and I really don’t see how that’s going to change.

In my mind I see Claude as a better search engine that understands code well enough to find answers and gain understanding faster. That’s about it.

	▲	anthonypasq 8 hours ago \| parent [-]
		can you imagine two years in the future and still believe this will be true? You are just dragging your feet. You will give in sooner or later, and i would suggest sooner.

▲

leptons 17 hours ago | parent | prev [-]

That's my experience too. It's okay at a few things that save me some typing, but it isn't really going to do the hard work for me. I also still need to spend significant amounts of time figuring out what it did wrong and correcting it. And that's frustrating. I don't make those mistakes, and I really dislike being led down bad paths. If "code smells" are bad, then "AI" is a rotting corpse.

	▲	thewebguyd 8 hours ago \| parent [-]
		> If "code smells" are bad, then "AI" is a rotting corpse. This is what's so frustrating about the hype bros for me. In most cases, everything AI spits out are code smells. We're all just supposed to toss out every engineering principle we've learned all so the owner class can hire less developers and suppress wages? I'm sure it's working great for everyone working on SaaS CRUD or web apps, but it's still not anywhere close to solving problems outside that sphere. Native? It's very hit and miss. It has very little design sense (because, why would it? It's a language model) so it chokes on SwiftUI, it also can't stop using deprecated stuff. And that's not even that specialized. It still hallucinates cmdlets if you try to do anything with PowerShell, and has near zero knowledge about the industry I work in, a historically not tech-forward industry where things are still shared in handcrafted PDF reports emailed out to subscribers. I'm going to leave this field entirely if the answer just becomes "just make everything in React/React Native because it's what the AI does best."

▲ apsurd 18 hours ago | parent | prev | next [-]

AI dramatically increases velocity. But is velocity productivity? Productivity relative to which scope: you, the team, the department, the company?

The question is really, velocity _of what_?

I got this from a HN comment. It really hit for me because the default mentality for engineers is to build. The more you build the better. That's not "wrong" but in a business setting it is very much necessary but not sufficient. And so whenever we think about productivity, impact, velocity, whatever measure of output, the real question is _of what_? More code? More product surface area? That was never really the problem. In fact it makes life worse majority of the time.

	▲	mattmanser 17 hours ago \| parent [-]
		The real question is, is it increasing their velocity? They've already admitted they just 'throw the code away and start again'. I think we've got another victim of perceived productivity gains vs actual productivity drop. People sitting around watching Claude churn out poor code at a slower rate than if they just wrote it themselves. Don't get me wrong, great for getting you started or writing a little prototype. But the code is bad, riddled with subtle bugs and if you're not rewriting it and shoving large amounts of AI code into your codebase, good luck in 6-12 months time.

▲ matheusmoreira 34 minutes ago | parent | prev | next [-]

Terrifying doesn't quite say it. The situation we're in is either we achieve a post-scarcity society or we'll all die.

▲ tripledry 15 hours ago | parent | prev | next [-]

Something I've been thinking about lately is if there is value in understanding the systems we produce and if we expected to?

If I can just vibe and shrug when someone asks why production is down globally then I'm sure the amount of features I can push out increases, but if I am still expected to understand and fix the systems I generate, I'm not convinced it's actually faster to vibe and then try to understand what's going on rather than thinking and writing.

In my experience the more I delegate to AI, the less I understand the results. The "slowness and thinking" might just be a feature not a bug, at times I feel that AI was simply the final straw that finally gave the nudge to lower standards.

▲

joe_mamba 15 hours ago | parent [-]

>if I can just vibe and shrug when someone asks why production is down globally

You're pretty high up in the development, decision and value-addition chain, if YOU are the responsible go-to person for these questions. AI has no impact on your position.

	▲	tripledry 14 hours ago \| parent [-]
		Naa, I'm just a programmer. Experience may vary depending on company and country, for me this has been true from tiny startups to global corporations. Tangential, I don't even know what "responsible" in the corporate world means anymore, it seems to me no one is really responsible for anything. But the one thing that's almost certain is that I will fix the damn thing if I made it go boom.

▲ exfalso 12 hours ago | parent | prev | next [-]

It's failing when there is no data in the training set, and there are no patterns to replicate in the existing code base.

I can give you many, many examples of where it failed for me:

1. Efficient implementation of Union-Find: complete garbage result 2. Spark pipelines: mostly garbage 3. Fuzzer for testing something: half success, non-replicateable ("creative") part was garbage. 4. Confidential Computing (niche): complete garbage if starting from scratch, good at extracting existing abstractions and replicating existing code.

Where it succeeds: 1. SQL queries 2. Following more precise descriptions of what to do 3. Replicating existing code patterns

The pattern is very clear. Novel things, things that require deeper domain knowledge, coming up with the to-be-replicated patterns themselves, problems with little data don't work. Everything else works.

I believe the reason why there is a big split in the reception is because senior engineers work on problems that don't have existing solutions - LLMs are terrible at those. What they are missing is that the software and the methodology must be modified in order to make the LLM work. There are methodical ways to do this, but this shift in the industry is still in baby shoes, and we don't yet have a shared understanding of what this methodology is.

Personally I have very strong opinions on how this should be done. But I'm urging everyone to start thinking about it, perhaps even going as far as quitting if this isn't something people can pursue at their current job. The carnage is coming:/

▲ msvana 17 hours ago | parent | prev | next [-]

I work as an ML engineer/researcher. When I implement a change in an experiment it usually takes at least an hour to get the results. I can use this time to implement a different experiment. Doesn't matter if I do it by hand or if I let an agent do it for me, I have enough time. Code isn't the bottleneck.

I also heard an opinion that since writing code is cheap, people implement things that have no economic value without really thinking it through.

▲

apsurd 17 hours ago | parent [-]

+1 on the economic value line. Not everything needs to be about money but if you get paid to ship code it's about money. And now we have coworkers shipping insane amounts of "features" because it's all free to ship and being an engineer, it ends there.

Only it doesn't, there's product positioning, UX, information architecture, onboarding and training, support, QA, change management, analytics, reporting… sigh

	▲	embedding-shape 14 hours ago \| parent [-]
		> but if you get paid to ship code it's about money. Tip to budding software engineers: try to not work in these sort of places, as they're about "looking busy" rather than engineering software, where the latter is where real long-lasting things are built, and the former is where startup founders spend most their money. The last paragraph is where the tricky and valuable parts are, and also where AI isn't super helpful today, and where you as a human can actually help out a lot if you're just 10% better than the rest of the "engineers" who only want to ship as fast as possible.

▲ kodablah 19 hours ago | parent | prev | next [-]

> People who are saying they're not seeing productivity boost, can you please share where is it failing?

At review time.

There are simply too many software industries that can't delegate both authorship _and_ review to non-humans because the maintenance/use of such software, especially in libraries and backwards-compat-concerning environments, cannot justify an "ends justifies the means" approach (yet).

▲ belZaah 17 hours ago | parent | prev | next [-]

I don’t think the objections are not necessarily in terms of lack of productivity although my personal experience is not that of massive productivity increases. The fact that you are producing code much faster is likely just to push the bottleneck somewhere else. Software value cycles are long and complicated. What if you run into an issue in 5 years the LLM fails to diagnose or fix due to complex system interactions? How often would that happen? Would it be feasible to just generate the whole thing anew matching functionality precisely? Are you making the right architecture choices from the perspective of what the preferred modus operandi of an llm is in 5 years? We don’t know. The more experienced folks tend to be conservative as they have experienced how badly things can age. Maybe this time it’ll be different?

▲ dumfries 15 hours ago | parent | prev | next [-]

"it works" is a very low standard when it comes to software engineering. Why are we not holding AI generated code to the same standards as we hold our peers during code reviews?

I have never heard anyone say "it works" as a positive thing when reviewing code..

Yes, there is a productivity boost but you can't tell me there is no decrease in quality

▲ kranke155 16 hours ago | parent | prev | next [-]

I work in commercials.

We can now make 1$ million dollar commercials with 100,000$ or less. So a 90% reduction in costs - if we use AI.

The issue is they don’t look great. AI isn’t that great at some key details.

But the agencies are really trying to push for it.

They think this is the way back to the big flashy commercials of old. Budgets are lower than ever, and shrinking.

Big issue here is really the misunderstanding of cause - budgets are lower, because advertising has changed in general (TV is less and less important ) and a lot of studies showed that advertising is actually not all that effective.

So they are grabbing onto a lifeboat. But I’m worried there’s no land.

I’ve planned my exit.

▲

uxcolumbo 16 hours ago | parent [-]

Advertisement is not that effective in general or just for certain channels, i.e. TV?

Also what are you existing to?

▲

kranke155 15 hours ago | parent | next [-]

So my understanding - from a friend at WPP who told me the same and from a freakonomics episode - is that advertising was wildly oversold before digital.

When the metrics arrived with digital, they saw that advertising, in some ways, was just not as effective as they’d hoped. In some ways the ROI wasn’t there. Seth Godin agrees. He says that advertising in the digital era could be as simple as just having a good product. I think this is Tesla’s position on it - make the best product and the internet takes care of it.

Legacy companies have kept large ad budgets but those are diminishing. From what I spoke with my friend at WPP, he said their data science team showed that outside of a new product or a product that is not recognised by consumers, the actual outcomes from ads are marginal or incremental. Thats what he told me. If your product is already known to consumers, the ROI is questionable.

▲

mikkupikku 13 hours ago | parent [-]

Advertising's foremost job is to sell the premise of advertising to business management. Selling the business's product is always secondary to that.

▲

ambicapter 11 hours ago | parent [-]

Always felt suspicious to me that so much of company dynamics are basically about selling yourself to management...and there's one team in the company who's full-time job is selling? Wonder how that will turn out.

	▲	gzread 10 hours ago \| parent [-]
		None of my coworkers could figure out why I was laid off, and were shocked because I was important to getting the work done, but management made it clear I hadn't been selling myself to management.

▲

kranke155 15 hours ago | parent | prev [-]

My exit is storytelling. I think that’s the only thing that will remain. I suspect humans will still want to hear stories about and from other humans.

There’s something about AIs that feels wrong for storytelling. I just don’t think people will want AIs to tell them stories. And if they do… Well, I believe in human storytelling.

▲ oytis 16 hours ago | parent | prev | next [-]

> I have stopped writing code, occasionally I jump into the changes proposed by LLM and make manual edits if it is feasible, otherwise I revert changes and ask it to generate again but based on my learnings from the past rejected output

Isn't it a very inefficient way to learn things? Like, normally, you would learn how things work and then write the code, refining your knowledge while you are writing. Now you don't learn anything in advance, and only do so reluctantly when things break? In the end there is a codebase that no one knows how it works.

▲

throwaw12 15 hours ago | parent | next [-]

> Isn't it a very inefficient way to learn things?

It is. But there are 2 things:

1. Do I want to learn that? (if I am coming back to this topic again in 5 months, knowledge accumulates, but there is a temptation to finish the thing quickly, because it is so boring to swim in huge legacy codebase)

2. How long it takes to grasp it and implement the solution? If I can complete it with AI in 2 days vs on my own in 2 weeks, I probably do not want to spend too much time on this thing

as I mentioned in other comments, this is exactly makes me worried about future of the work I will be doing, because there is no attachment to the product in my brain, no mental models being built, no muscles trained, it feels someone else's "work", because it explores the code, it writes the code. I just judge it when I get a task

▲

oytis 15 hours ago | parent [-]

I don't know where it goes, but it sounds pretty dumb for the companies involved too. Tech companies are in the business of nurturing teams knowledgeable in things so they can build something that gives them an advantage over competition. If there is no knowledge being built, there is no advantage and no tech business.

	▲	hobofan 15 hours ago \| parent [-]
		> Tech companies are in the business of nurturing teams knowledgeable in things It pains the anti-capitalist fibers in my body to say this, but no they are not. At the maximum the value is in organizational knowledge and existing assets (= source code, documentation), so that people with the least knowledge possible can make changes. In software companies in general, technical excellence and knowledge is not strongly correlated with economic success as long as you clear a certain bar (that's not that high). In comparison, in hardware/engineering companies, that's a lot more correlated. In the concrete example of a legacy codebase we have here, there is even less value in trying to build up knowledge in the company, as it has already been decided that the system is to be discarded anyways.

▲

hobofan 16 hours ago | parent | prev [-]

> you would learn how things work and then write the code

In a legacy codebase this may require learning a lot of things about how things work just to make small changes, which may be much less efficient.

	▲	oytis 15 hours ago \| parent [-]
		I might still be naive about the industry, but if you don't know how the legacy codebase works, you might either delegate the change to someone else in the company who does, or, if there is no one left, use this opportunity to become the person who knows at least something about it.

▲ pinkmuffinere 18 hours ago | parent | prev | next [-]

I asked opus 4.6 how to administer an A/B test when data is sparse. My options are to look at conversion rate, look at revenue per customer, or something else. I will get about 10-20k samples, less than that will add to cart, less than that will begin checkout, and even less than that will convert. Opus says I should look at revenue per customers. I don't know the right answer, but I know it is not to look at revenue per customers -- that will have high variance due to outlier customers who put in a large order. To be fair, I do use opus frequently, and it often gives good enough answers. But you do have to be suspicious of its responses for important decisions.

Edit: Ha, and the report claims it's relatively good at business and finance...

Edit 2: After discussion in this thread, I went back to opus and asked it to link to articles about how to handle non-normally distributed data, and it actually did link to some useful articles, and an online calculator that I believe works for my data. So I'll eat some humble pie and say my initial take was at least partially wrong here. At the same time, it was important to know the correct question to ask, and honestly if it wasn't for this thread I'm not sure I would have gotten there.

▲

onion2k 18 hours ago | parent [-]

A/B tests are a statistical tool, and outliers will mess with any statistical measure. If your data is especially prone to that you should be using something that accounts for them, and your prompt to Opus should tell it to account for that.

A good way to use AI is to treat it like a brilliant junior. It knows a lot about how things work in general but very little about your specific domain. If your data has a particular shape (e.g lots of orders with a few large orders as outliers) you have to tell it that to improve the results you get back.

▲

pinkmuffinere 17 hours ago | parent [-]

I did tell it that I expect to see something like a power-law distribution in order value, so I think I pretty much followed your instructions here. Btw, if you do know the right thing to do in my scenario, I'd love to figure it out. This is not my area of expertise, and just figuring it out through articles so far.

▲

Karrot_Kream 17 hours ago | parent [-]

I recommend reading Wikipedia and talking to LLMs to get this one. Order values do follow power-law distributions (you're probably looking for an exponential or a Zipf distribution.) You want to ask how to perform a statistical test using these distributions. I'm a fan of Bayesian techniques here, but it's up to you if you want to use a frequentist approach. If you can follow some basic calculus you can follow the math for constructing these statistical tests, if not some searching will help you find the formulas you need.

▲

pinkmuffinere 17 hours ago | parent [-]

Thanks for the suggestions! I didn't want to do the math myself, but I did take your suggestion and found some articles discussing ways to make it work even with a non-normal distribution:

- https://cxl.com/blog/outliers/

- https://www.blastx.com/insights/the-best-revenue-significanc...

- (online tool to calculate significance) https://www.blastx.com/rpv-calculator

I'm not checking their math, but the articles make sense to me, and I trust they did implement it correctly. In the end the LLM did get me to the correct answer by suggesting the articles, so I guess I should eat some humble pie and say it _did_ help me. At the same time, if I didn't have the intuition that using rpv as-is in a t-test would be noisy, and the suggestions from this comment thread, I think I could have gone down the wrong path. So I'm not sure what my conclusion is -- maybe something like LLMs are helpful once you ask the right question.

▲

Karrot_Kream 16 hours ago | parent [-]

One heuristic I like to use when thinking about this question (and I honestly wish the answer space here were less emotionally charged, so we could all learn from each other) is that: LLMs need a human to understand the shape of the solution to check the LLM's work. In fields that I have confirmed expertise in, I can easily nudge and steer the LLM and only skim its output quickly to know if it's right or wrong. In fields I don't, I first ask the LLM for resources (papers, textbooks, articles, etc) and familiarize myself with some initial literature first. I then work with the LLMs slowly to make a solution. I've found that to work well so far.

(I also just love statistics and think it's some of the most applicable math to everyday life in everything from bus arrival times to road traffic to order values to financial markets.)

	▲	pinkmuffinere 7 hours ago \| parent [-]
		I think this is a _really_ insightful answer about effectively working with LLMs. And you’re winning me over on statistics too :)

▲ iugtmkbdfil834 16 hours ago | parent | prev | next [-]

I don't want to generalize from my specific situation too much, but I want to offer an anecdote from my neck of the woods. On my personal sub, I agree it is kinda crazy the kind of projects I can get into now with little to no prior knowledge.

On the other hand, our corporate AI is.. not great atm. It was briefly kinda decent and then suddenly it kinda degraded. Worst case is, no one is communicating with us so we don't know what was changed. It is possible companies are already trying to 'optimize'.

I know it is not exactly what you are asking. You are saying capability is there, but I am personally starting to see a crack in corporate willingness to spend.

▲ gurghet 13 hours ago | parent | prev | next [-]

Basically it tricks you into making the code less maintainable, so while it seems to boost productivity, it's just delaying huge failures.

	▲	rootusrootus 7 hours ago \| parent [-]
		Exactly this, IMO. We are in a race to see whether the mountain of technical debt that AI is creating grows faster than AI's ability to whittle it down later.

▲ girvo 11 hours ago | parent | prev | next [-]

Sometimes I’m scared.

Sometimes I realise that this particular task has been slower than if I’d done it myself when I take in to account full wall clock time.

I can’t tell what type of task is going to work ahead of time yet.

▲ aurareturn 19 hours ago | parent | prev | next [-]

  People who are saying they're not seeing productivity boost, can you please share where is it failing?

Believe it or not, I still know many devs who do not use any agents. They're still using free ChatGPT copy and paste.

I'm going to guess that many people on HN are also on the "free ChatGPT isn't that good at programming" train.

▲

dataflow 19 hours ago | parent | next [-]

Which one would you recommend as the best right now? Claude Code?

	▲	redhed 6 hours ago \| parent [-]
		I have been having a lot of success with Cursor. I like being able to switch between Anthropic and OpenAI models. Claude Code does gives way more tokens/$ than Cursor right now though.

▲

salawat 18 hours ago | parent | prev | next [-]

Not everyone has the capability to rent out data center tier hardware to just do their job. These things require so much damn compute you need some serious heft to actually daisy chain enough stages either in parallel or deep to get enough tokens/sec for the experience to go ham. If you're making bags o' coke money, and deciding to fund Altman's, Zuckernut's or Amazon/Google's/Microsoft's datacenter build out, that's on you. Rest of us are just trying to get by on bits and bobs we've kept limping over the years. If opencode is anything to judge the vibecoded scene by, I'm fairly sure at some point the vibe crowd will learn the lesson of isolating the most expensive computation ever from the hot loop, then maybe find one day all they needed was maybe something to let the model build a context, and a text editor.

Til then wtf_are_these_abstractions.jpg

	▲	ijk 6 hours ago \| parent [-]
		This is my current problem: I can get work to pay for a Claude Max subscription, but for personal use or to learn how to use it that's a big price tag. I worry that we're returning to an era of renting core development tools. After the huge benefits from free and open source tools, that's a bitter pill to swallow.

▲

throwaw12 19 hours ago | parent | prev [-]

> They're still using free ChatGPT copy and paste

Probably that's the reason why some people are sure their job is still safe.

Nature of job is changing rapidly

▲

aurareturn 19 hours ago | parent [-]

I totally get tech CEOs who threaten to fire their devs who do not embrace AI tools.

I'm not a tech CEO but people who are anti-LLM for programming have no place on my team.

▲

salawat 18 hours ago | parent [-]

And you are paying for their tokens on top of their salary, right? Right?

▲

aurareturn 18 hours ago | parent | next [-]

You can do a lot with just $20 Codex CLI subscription. Tokens are cheap compared to the $20k we're paying for a dev each month.

▲

ido 18 hours ago | parent | next [-]

Even the $200 claude max monthly subscription is peanuts compared to salary cost.

▲

monksy 17 hours ago | parent [-]

Tell that to the company that I was just at that cut Intelij licenses as cost cutting measures.

▲

aurareturn 17 hours ago | parent [-]

If they really want to cut cost, fire the worst dev on the team and use that money to give everyone a Codex subscription.

▲

KronisLV 16 hours ago | parent [-]

Or better yet, fire the managers or bean counters that think decreasing everyone’s productivity is good for long term savings.

I’m reminded of https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...

▲

mikkupikku 13 hours ago | parent [-]

Fire the middle management, HR, and etc that have been enthusiastically using AI to do their jobs for the past two or three years already. 90% of them can be replaced by an agent with access to an email account.

	▲	thewebguyd 8 hours ago \| parent [-]
		Tbh, if companies want to use AI to lay off and cut costs, that's exactly where they should be doing it, not engineering. How much bloat and bureaucracy bottleneck is sitting in middle management whose favorite past time is wasting everyone's time on meetings that could have been an email? HR? Not the execs, but the HR drones that do nothing but answer employee questions about policy, could have already been replaced with not even an AI, just an old school chatbot, a long time ago. Instead of cutting engineers, cut the non-tech jobs, flatten the structure.

▲

hdgvhicv 17 hours ago | parent | prev | next [-]

Amazes me that people pay 20k a month for a dev rather than paying 2k a month for one in Poland or 1k a month for one from India

There’s obviously a benefit of paying higher rates for US programmers, but does that benefit change when llms are thrown into the mix

▲

apercu 13 hours ago | parent | next [-]

My experience with outsourcing over 20+ years (Russia, Romania, India, South America) is that you just move money around when you do it.

It takes more planning, more specification, more coordination, more QA. The quality is almost always worse, and remediation takes forever. So your BA, QA and PM time goes way up and absorbs any cost savings.

YMMV.

	▲	thewebguyd 8 hours ago \| parent [-]
		Sounds like an LLM, tbh. Using Claude also takes more planning, more explicit specification, prompting, more manual review, more QA.

▲

aurareturn 12 hours ago | parent | prev | next [-]

Makes no sense because LLMs makes it far less worth it to outsource developers.

▲

forgotlastlogin 13 hours ago | parent | prev [-]

2k in Poland you say...

▲

baq 18 hours ago | parent | prev [-]

Exactly, the $20 codex is so good value it’s irresponsible to not give it to everyone. Claude code $20 is otoh pointless, the limits are good enough for 10 mins of work twice per business day.

▲

onion2k 17 hours ago | parent | prev | next [-]

Every business that's taking AI seriously is giving their team enterprise accounts to AI services. Otherwise you have no control over where your code, data, company info, etc is going.

Someone deciding to drop a spreadsheet of customer data into their personal AI account to increase their productivity would be catastrophic for business, so you need rules. And rules means paying for enterprise AI tooling.

▲

mikkupikku 13 hours ago | parent | prev [-]

"Bring your own tools" is not exactly novel in the workplace. Maybe so for office workers, but not more generally. Anyway, these particular tools are cheap enough that it hardly even matters who is expected to pay for them.

The $20 a month tier in particular is a trivial expense, on par with businesses that expect their workers to wear steel toed shoes. Some may give workers a little stipend to buy those boots, some not. Either way, it doesn't really matter.

	▲	thewebguyd 8 hours ago \| parent [-]
		Just because it's not novel, doesn't mean it's right. I also don't agree with, for example, many mechanics being forced to buy their own tools (especially what little they get paid). I don't do tech outside of 9-5, so either my employer pays for it all, or I don't use it. Simple as that. Thankfully, they do pay for it, but I couldn't imagine working somewhere that says "You need to use AI" and then not providing it on their dime. Quite frankly it should be regulation that if a W2 employee needs something to perform their job duties, the employer must provide it.

▲ staticassertion 16 hours ago | parent | prev | next [-]

When it comes to novel work, LLMs become "fast typers" for me and little more. They accelerate testing phases but that's it. The bar for novelty isn't very high either - "make this specific system scale in a way that others won't" isn't a thing an LLM can ever do on its own, though it can be an aid.

LLMs also are quite bad for security. They can find simple bugs, but they don't find the really interesting ones that leverage "gap between mental model and implementation" or "combination of features and bugs" etc, which is where most of the interesting security work is imo.

▲

gilbetron 10 hours ago | parent | next [-]

What was your take on this?

https://aisle.com/blog/what-ai-security-research-looks-like-...

▲

asadm 16 hours ago | parent | prev | next [-]

I think your analysis is a bit outdated these days or you may be holding it wrong.

I am doing novel work with codex but it does need some prompting ie. exploring possibilities from current codebase, adding papers to prompt etc.

For security, I think I generally start a new thread before committing to review from security pov.

▲

staticassertion 16 hours ago | parent [-]

You can do novel work with an LLM. You can. The LLM can't. It can be an aid - exploring papers, gathering information, helping to validate, etc. It can't do the actual novel part, fundamentally it is limited to what it is trained on.

If you are relying on the LLM and context, then unless your context is a secret your competitor is only ever one prompt behind you. If you're willing to pursue true novelty, you need a human and you can leap beyond your competition.

▲

bdangubic 13 hours ago | parent [-]

of course you need a human but do not need nearly as many humans as there are currently in the labor force

▲

staticassertion 12 hours ago | parent [-]

Maybe, but I'm not really convinced. LLMs make some aspects of the job faster, mainly I don't have to type anymore. But... that was always a relatively small portion of the job. Design, understanding constraints, maintaining and operating code, deciding what to do, what not to do, when to do it, gaining consensus across product, eng, support, and customers, etc. I do all of those things as an engineer. Coding faster is really awesome, it's so nice, and I can whip up POCs for the frontend etc now, and that's accelerating development... but that's it.

The reality is that a huge portion of my time is spent doing similar work and what LLMs largely do is pick up the smaller tasks or features that I may not have prioritized otherwise. Revolutionary in one sense, completely banal and a really minor part of my job in many others.

	▲	bdangubic 10 hours ago \| parent [-]
		I think the core issue (evidenced by constant stream of debates on HN) is the everyone’s experience with LLMs is different. I think we can all agree that some experiences are like yours while there are others that are vastly different than yours. Sometimes I hear “you just don’t know how to use them etc…” as if there is some magic setup that makes them do shit but the reality is that our actual jobs are drastically different even though we all technically have same titles. I have been a contractor for a decade now and have been on projects that require real “engineers” doing real hardcore shit. I have also been on projects where tens of people are doing work I can train my 12-year old daughter to be proficient in a month. I would gauge that percentage of the former is much smaller than later

▲

truetraveller 13 hours ago | parent | prev [-]

This is basically my take as well!

▲ vividfrier 11 hours ago | parent | prev | next [-]

Same. Whenever an article like this one pops up the comments seem to be filled with confirmation bias. People who don't see a productivity boost agree with the article.

I work at tech company just outside of big tech and I feel fairly confident that we won't have a need for the amount of developers we currently have within 3-4 years.

The bottleneck right now is reviewing and I think it's just a matter of time before our leadership removes the requirement for human code reviews (I am already seeing signs of this ("Maybe for code behind feature flags we don't need code reviews?").

Whenever there's an incident, there is a pagerduty trigger to an agent looking at the metrics, logs, software component graphs, and gives you an hypothesis on what the incident is due to. When I push a branch with test failures, I get one-click buttons in my PR to append commits fixing those tests failures (i.e. an agent analyses the code, the jira ticket, the tests, etc. and suggests a fix for the tests failing). We have a Slack agent we can ping in trivial feature requests (or bugs) in our support channels.

The agents are being integrated at every step. And it's not like the agents will stop improving. The difference between GPT3.5 and Opus 4.6 is so massive. So what will the models look like in 5 years from now?

We're cooked and the easiest way to tell someone works at a company who hasn't come very far in their AI journey is that they're not worried.

▲ dataflow 18 hours ago | parent | prev | next [-]

I feel like this might be heavily dependent on both your task and the AI you're using? What language do you code in and what AI do you use? And are your tasks pretty typical/boilerplate-y with prior art to go off of, or novel/at-the-edge-of-tech?

▲ sivanmz 17 hours ago | parent | prev | next [-]

It’s been my experience as of recently. I point it at an issue tracker and ask it to investigate, write a test to reproduce the problem and plan a fix together. There’s lots of hand holding from me but it saves me a lot of work and I’ve been surprised by its comfort with legacy code bases. For now I feel empowered, and I’m actually working more intensively, but I was wondering to myself if I’m going run out of work this year. Interestingly, our metrics show that output is slowed by increased workload on reviewers.

▲ motbus3 12 hours ago | parent | prev | next [-]

I think you can more stuff done earlier but the quality is not good or it doesn't work as expected if you tinker with it enough. Fixing the issues from the generated code usually doesn't work at all

▲ boxedemp 19 hours ago | parent | prev | next [-]

I'm with you. The project I'm working on is moving at phenomenal velocity. I'm basically spending my time writing specs and performing code reviews. As long as my code review comments and design docs are clear I get a secure, scalable, and resilient system.

Tests were always important, but now they are the gatekeepers to velocity.

▲ RandomLensman 18 hours ago | parent | prev | next [-]

Outside of coding/non-physical areas, the impact can be quite muted. I haven't seen much impact on surgical procedures, for example (but maybe others have?).

▲ KronisLV 17 hours ago | parent | prev | next [-]

I’m currently working across like 5 projects (was 4 last week but you know how it is). I now do more in days than others might in a week.

Yesterday a colleague didn’t quite manage to implement a loading container with a Vue directive instead of DOM hacks, it was easier for me to just throw AI at the problem and produced a working and tested solution and developer docs than to have a similarly long meeting and have them iterate for hours.

Then I got back to training a CNN to recognize crops from space (ploughing and mowing will need to be estimated alongside inference, since no markers in training data but can look at BSI changes for example), deployed a new version of an Ollama/OpenAI/Anthropic proxy that can work with AWS Bedrock and updated the docs site instructions, deployed a new app that will have a standup bot and on-demand AI code review (LiteLLM and Django) and am working on codegen to migrate some Oracle forms that have been stagnating otherwise.

It’s not funny how overworked I am and sure I still have to babysit parallel Claude Code sessions and sometimes test things manually and write out changes, but this is a completely different work compared to two or three years ago.

Maybe the problem spaces I’m dealing with are nothing novel, but I assume most devs are like that - and I’d be surprised at people’s productivity not increasing.

When people nag in meetings about needing to change something in a codebase, or not knowing how to implement something and its value add, I’ll often have something working shortly after the meeting is over (due to starting during it).

Instead of sending adding Vitest to the backlog graveyard, I had it integrated and running in one or two evenings with about 1200 tests (and fixed some bugs). Instead of talking about hypothetical Oxlint and Oxfmt performance improvements, I had both benchmarked against ESLint and Prettier within the hour.

Same for making server config changes with Ansible that I previously didn’t due to additional friction - it is mostly just gone (as long as I allow some free time planned in case things vet fucked up and I need to fix them).

Edit: oh and in my free time I built a Whisper + VLM + LLM pipeline based on OpenVINO so that I can feed it hours long stream VODs and get an EDL cut to desired length that I can then import in DaVinci Resolve and work on video editing after the first basic editing prepass is done (also PyScene detect and some audio alignment to prevent bad cuts). And then I integrated it with subscription Claude Code, not just LiteLLM and cloud providers with per-token costs for the actual cuts making part (scene description and audio transcriptions stay local since those don't need a complex LLM, but can use cloud for cuts).

Oh and I'm moving from my Contabo VPSes to running stuff inside of a Hetzner Server Auction server that now has Proxmox and VMs in that, except this time around I'm moving over to Ansible for managing it instead of manual scripts as well, and also I'm migrating over from Docker Swarm to regular Docker Compose + Tailscale networks (maybe Headscale later) and also using more upstream containers where needed instead of trying to build all of mine myself, since storage isn't a problem and consistency isn't that important. At the same time I also migrated from Drone CI to Woodpecker CI and from Nexus to Gitea Packages, since I'm already using Gitea and since Nexus is a maintenance burden.

If this becomes the new “normal” in regards to everyone’s productivity though, there will be an insane amount of burnout and devaluation of work.

▲

Karrot_Kream 17 hours ago | parent | next [-]

> When people nag in meetings about needing to change something in a codebase, or not knowing how to implement something and its value add, I’ll often have something working shortly after the meeting is over (due to starting during it).

We've started building harnesses to allow people who don't understand code to create PRs to implement their little nags. We rely on an engineer to review, merge, and steward the change but it means that non-eng folks do not rely on us as a gate. (We're a startup and can't really afford "teams" to do this hand-holding and triage for us.)

As you say we're all a bit overworked and burned out. I've been context switching so much that on days when I'm very productive I've started just getting headaches. I'm achieving a lot more than before but holding the various threads in my head and context switching is just a lot.

▲

leptons 17 hours ago | parent | prev [-]

>I now do more in days than others might in a week.

I've always done more in days than others might in a week. YMMV.

	▲	sph 10 hours ago \| parent [-]
		So do I, this is why I work 15 hours a week [1] and laugh at those that use this new productivity tool to work themselves even harder for the same pay. Wasn’t the point of automation to work less? 1: pre-AI. Not keen on becoming a manager of an idiot savant, so I’m planning my exit.

▲ randusername 7 hours ago | parent | prev | next [-]

I see an individual productivity boost, but not necessarily a collective one.

I don't think features per hour is really what is holding back most established businesses.

My experiences suggest that we still have some time before the people that understand the plumbing of the business _and_ AI bubble up to positions of authority through wielding it practically and successfully at increasingly greater scale.

▲ fulafel 18 hours ago | parent | prev | next [-]

A terminology tangent because it's an econ publication: Notice that the article doesn't talk about productivity.

Productivity is a term of art in economics and means you generate more units of output (for example per person, per input, per wages paid) but doesn't take quality or otherwise desireability into account. It's best suited for commodities and industrial outputs (and maybe slop?).

▲ zozbot234 13 hours ago | parent | prev | next [-]

> I am terrified about what's coming

Why? This is great. AI fixing up huge legacy codebases is just taking the jobs humans would never want to do.

▲ lm28469 15 hours ago | parent | prev | next [-]

Meanwhile gemini tells me my go code doesn't compile (it does)

Gaslight me by telling me I must be a time traveler because I use go 1.26 but the latest version actually is 1.24

And tell me I can't use wg.Go() because this function does not exist (it does)

▲ drekipus 15 hours ago | parent | prev | next [-]

> my job is easier now, I do less. > I am terrified about what's coming.

God I hope I never ever have to work with you

▲ therealdrag0 19 hours ago | parent | prev | next [-]

I can only explain it by people not having used Agentic tools and or only having tried it 9 months ago for a day before giving up or having such strict coding style preferences they burn time adjusting generated code to their preferences and blaming the AI even though they’re non-functional changes and they didn’t bother to encode them into rules.

The productivity gains are blatantly obvious at this point. Even in large distributed code bases. From jr to senior engineer.

▲

MattGaiser 17 hours ago | parent [-]

I can see someone who is very particular about their way being the right way having issues with it. I’m very much the kind of person who believes that if I can’t write a failing test, I don’t have a very serious case. A lot of devs aren’t like that.

	▲	layer8 12 hours ago \| parent [-]
		Sometimes you’re unable to write a failing test because the code is such that you can’t reliably reason about it, and hence have a hard time finding the cases where it will do the wrong thing. Being able to reason about code in that way is as important as for the code to be testable.

▲ truetraveller 19 hours ago | parent | prev [-]

You were probably deficient in RESEARCH skills before. No offense to you, since I was also like this once. LLMs research and put the results on the plate. Yes, for people who were deficient in research skills, I can see 2-3x improvements.

Note1: I have "expert" level research skills. But LLMs still help me in research, but the boost is probably 1.2x max. But

Note2: By research, I mean googling, github search, forum search, etc. And quickly testing using jsfiddle/codepen, etc.

▲

throwaw12 19 hours ago | parent | next [-]

no worries, I do not get offended quickly.

But I also think you are overestimating your RESEARCH skills, even if you are very good at research, I am sure you can't read 25 files in parallel, summarize them (even if its missing some details) in 1 minute and then come up with somewhat working solution in the next 2 minutes.

I am pretty sure, humans can't comprehend reading 25 code files with each having at least 400 lines of non-boilerplate code in 2 minutes. LLM can do it and its very very good at summarizing.

I can even steer its summarizing skills by prompting where to focus on when its reading files (because now I can iterate 2-3 times for each RESEARCH task and improve my next attempt based on shortcomings in the previous attempt)

	▲	truetraveller 14 hours ago \| parent [-]
		OK, it's not just RESEARCH, but "RESEARCHability" of the source content [in this case code], and also critical analysis ability [not saying you are deficient in anything, speaking in general terms]. In this example, if the 25 files are organized nicely, and I had I nice IDE that listed class/namespace members of each file neatly, I might take 30 minutes to understand the overall structure. Morever, If I critically analyzed this, I would ask "how many times does this event of summarizing 25 files happen"? I mean, are we changing codebases every day? No, it's a one time cost. Moreover, manually going through will provide insight not returned by LLM. Obviously, every case is different, and perhaps you do need to RESEARCH new codebases often, I dunno!

▲

siva7 16 hours ago | parent | prev | next [-]

Ok Mr. Expert Level Researcher, go back and read the comment of parent again to find out that it has nothing to do with deficiency in research skills.

	▲	truetraveller 14 hours ago \| parent [-]
		Lol! Didn't mean any harm, just giving my 2cents!

▲

throwaw12 17 hours ago | parent | prev [-]

please don't change your comment constantly (or at maybe add UPDATE 1/2/3), because you had different words before, like you were saying something in this fashion:

* you probably lack good RESEARCH skills

* I can see at most 1.25x improvements - now it is 2-3x

By updating your comment you are making my reply irrelevant to your past response

	▲	truetraveller 14 hours ago \| parent [-]
		Apologies, I changed this within a ~10 minute period. Never knew you would actually see it so fast.