Remix.run Logo
Trillions spent and big software projects are still failing(spectrum.ieee.org)
262 points by pseudolus 11 hours ago | 240 comments
rossdavidh 4 hours ago | parent | next [-]

It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features. If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level. There is no software development process which reliably produces software that works at scale without doing it small, and medium sized, first, and fixing what goes wrong before you go big.

shagie an hour ago | parent | next [-]

> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.

At a large box retail chain (15 states, ~300 stores) I worked on a project to replace the POS system.

The original plan had us getting everything working (Ha!) and then deploying it out to stores and then ending up with the two oddball "stores". The company cafeteria and surplus store were technically stores in that they had all the same setup and processes but were odd.

When the team that I was on was brought into this project, we flipped that around and first deployed to those two several months ahead of the schedule to deploy to the regular stores.

In particular, the surplus store had a few dozen transactions a day. If anything broke, you could do reconciliation by hand. The cafeteria had single register transaction volume that surpassed a surplus store on most any other day. Furthermore, all of its transactions were payroll deductions (swipe your badge rather than credit card or cash). This meant that if anything went wrong there we weren't in trouble with PCI and could debit and credit accounts.

Ultimately, we made our deadline to get things out to stores. We did have one nasty bug that showed up in late October (or was it early November?) with repackaging counts (if a box of 6 was $24 and if purchased as a single item it was $4.50 ... but if you bought 6 single items it was "repackaged" to cost $24 rather than $27) which interacted with a BOGO sale. That bug resulted in absurd receipts with sales and discounts (the receipt showed you spent $10,000 but were discounted $9,976 ... and then the GMs got alerts that the store was not able to make payroll because of a $9,976 discount ... one of the devs pulled an all nighter to fix that one and it got pushed to the stores ).

I shudder to think about what would have happened if we had tried to push the POS system out to customer facing stores where the performance issues in the cafeteria where worked out first or if we had to reconcile transactions to hunt down incorrect tax calculations.

einpoklum an hour ago | parent [-]

You could have, in principle, implemented the new system to be able to run in "dummy mode" alongside the existing system at regular stores, so that you see that it produces the 'same' results in terms of what the existing system is able to provide.

Which is to say, there is more than one approach to gradual deployment.

shagie 3 minutes ago | parent [-]

Not easily when issues of PCI get in there.

Things like the credit card reader (and magnetic ink reader for checks), different input device (sending the barcode scanner two two different systems), keyboard input (completely different screens and keyed entry) would have made those hardware problems also things that needed to be solved.

The old system was a DOS based one where a given set of Fkeys were used to switch between screens on a . Need to do hand entry of a SKU? That was F4 and then type the number. Need to do a search for the description of an item? That was F5. The keyboard was particular to that register setup and used an old school XT (5 pin DIN) plug. The new systems were much more modern linux boxes that used USB plugs. The mag strip reader was flashed to new screens (and the old ones were replaced).

For this situation, it wasn't something that we could send keyboard, scanner, and credit card events to another register.

solatic 3 hours ago | parent | prev | next [-]

That's what works for products, not software systems. Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts. Eventually you want to rewrite to deal with all the technical debt, but nobody has enough confidence to say what is in the codebase that's important to Product and what isn't, so everybody is afraid and frozen.

Scale is separately a Product and Engineering question. You are correct that you cannot scale a Product to delight many users without it first delighting a small group of users. But there are plenty of scaled Engineering systems that were designed from the beginning to reach massive scale. WhatsApp is probably the canonical example of something that was a rather simple Product with very highly scaled Engineering and it's how they were able to grow so much with such a small team.

jimbokun 5 minutes ago | parent | next [-]

Yes, it can be very difficult to add “scale” after the fact, once you already have a lot of data persisted in a certain way.

Jtsummers 2 hours ago | parent | prev | next [-]

Designing or intending a system to be used at massive scale is not the same as building and deploying it so that it only initially runs at that massive scale.

That's just a recipe for disaster, "We don't even know if we can handle 100 users, let's now force 1 million people to use the system simultaneously." Even WhatsApp couldn't handle hundreds of millions of users on the day it was first released, nor did it attempt to. You build out slowly and make sure things work, at least if you're competent and sane.

solatic 2 hours ago | parent | next [-]

Sure, but if you did a good job, the gradual deployment can go relatively quickly and smoothly, which is how $FAANG roll out new features and products to very large audiences. The actual rollout is usually a bit of an implementation detail of what first needed to be architected to handle that larger scale.

mk89 2 hours ago | parent | prev [-]

No but whatsapp was built by 2 guys that had previously worked at Yahoo, and they picked a very strong tech for the backend: erlang.

So while they probably didn't bother scaling the service to millions in the first version, they 1) knew what it would take, 2) chose already from the ground up a good technology to have a smoother transition to your "X millions users". The step "X millions to XYZ millions and then billions" required other things too.

At least they didn't have to write a php-to-C++ compiler for Php like Facebook had, given the initial design choice of Mark Zuckeberg, which shows exactly what it means to begin something already with the right tool and ideas in mind.

But this takes skills.

Jtsummers 2 hours ago | parent | next [-]

> No but whatsapp was built by 2 guys that had previously worked at Yahoo, and they picked a very strong tech for the backend: erlang.

https://news.ycombinator.com/item?id=44911553

Started as PHP, not as Erlang.

> 1) knew what it would take, 2) chose already from the ground up a good technology to have a smoother transition to your "X millions users".

No, as above, that was a pivot. They did not start from the ground up with Erlang or ejabberd, they adopted that later.

mk89 41 minutes ago | parent [-]

Thanks, somehow I remembered wrong.

nradov 2 hours ago | parent | prev [-]

Did they succeed because of Erlang or in spite of Erlang? We can't draw any reliable conclusions from a single data point. Maybe a different platform would have worked even better?

awesome_dude an hour ago | parent [-]

Yeah - the technology used is a seperate concern to their abilities as users (developers) of that technology and the effectiveness at handling the scale.

I, for example, have always said that I am more than capable of writing code in C that is several orders of magnitude SLOWER than what I could write in.. say Python.

My skillset would never be used as an example of the value of C for whatever

lelandbatey 20 minutes ago | parent | prev | next [-]

Gradual growth =/= many tacked on features. Many tacked on features =/= technical debt. Technical debt =/= "everybody is afraid and frozen." Those are merely often correlated, but not required.

Whatsapp is a terrible example because it's barely a product; Whatsapp is mostly a free offering of goodwill riding on the back of actual products like Facebook Ads. A great example would be a product like Salesforce, SAP, or Microsoft Dynamics. Those products are forced to grow and change and adapt and scale, to massive numbers doing tons of work, all while being actual products and being software systems. I think such products act as stark rebukes of what you've described.

dustingetz 3 hours ago | parent | prev | next [-]

we get paid to add to it, we don’t get paid to take away

paulsutter 2 hours ago | parent | prev [-]

You have to design for scale AND deploy gradually

hinkley 3 hours ago | parent | prev | next [-]

The dominant factor is: there is a human who understands the entire system.

That is vastly easier to achieve by making a small, successful system, which gets buy in from both users and builders to the extent that the former pay sufficient money for the latter to be invested in understanding the entire system and then growing it and keeping up with the changes.

Occasionally a moon shot program can overcome all of that inertia, but the “90% of all projects fail” is definitely overrepresented in large projects. And the Precautionary Principle says you shouldn’t because the consequences are so high.

OtherShrezzing 3 hours ago | parent | prev | next [-]

While I think this is good advice in general, I don’t think your statement that “there is no process to create scalable software” holds true.

The uk gov development service reliably implements huge systems over and over again, and those systems go out to tens of millions from day 1. As a rule of thumb, the parts of the uk govt digital suite that suck are the parts the development service haven’t been assigned to yet.

The Swift banking org launches reliable features to hundreds of millions of users.

There’s honestly loads of instances of organisations reliably implementing robust and scalable software without starting with tens of users.

sjclemmy 2 hours ago | parent | next [-]

The uk government development service as you call it is not a service. It’s more of a declaration of process that is adhered to across diverse departments and organisations that make up the government. It’s usually small teams that are responsible for exploring what a service is or needs and then implementing it. They are able to deliver decent services because they start small, design and user test iteratively and only when there is a really good understanding of what’s being delivered do they scale out. The technology is the easy bit.

sam_lowry_ 2 hours ago | parent | prev [-]

SWIFT? Hold my beer. SWIFT did not launch anything substantial since its startup days in early 70-ies.

Moreover, their core tech did not evolve that far from that era, and the 70-ies tech bros are still there through their progeniture.

Here's an anecdote: The first messaging system built by SWIFT was text-based, somewhat similar to ASN.1.

The next one used XML, as it was the fad of the day. Unfortunately, neither SWIFT nor the banks could handle 2-3 orders of magnitude increase in payload size in their ancient systems. Yes, as engineers, you would think compressing XML would solve the problem and you would by right. Moreover, XML Infoset already existed, and it defined compression as a function of the XML Schema, so it was somewhat more deterministic even though not more efficient than LZMA.

But the suits decided differently. At one of the SIBOS conferences they abbreviate XML tags, and did it literally on paper and without thinking about back-and-forth translation, dupes, etc.

And this is how we landed with ISO20022 abberviations that we all know and love: Ccy for Currency, Pmt for Payment, Dt for Date, etc.

noname120 8 minutes ago | parent [-]

Harder to audit when every payload needs to be decompressed to be inspected

hintymad an hour ago | parent | prev | next [-]

> https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512

This is what https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512 advocates too: start small, modularize, and then scale. The example of Tesla's mega factory was particular enticing.

the_duke 3 hours ago | parent | prev | next [-]

The accounting, legal and business process requirements are vastly different at different scales, different jurisdictions, different countries, etc.

There's a crazy amount of complexity and customizability in systems like ERPs for multinational corporations (SAP, Oracle).

When you start with a small town, you'll have to throw most of everything away when moving to a different scale.

That's true for software systems in general. If major requirements are bolted on after the fact, instead of designed into the system from the beginning, you usually end up with an unmaintainable mess.

nostrademons 35 minutes ago | parent | prev | next [-]

Came here to say this. I still think that Linus Torvalds has the most profound advice to building a large, highly successful software system:

"Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don't think about some big picture and fancy design. If it doesn't solve some fairly immediate need, it's almost certainly over-designed. And don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project."

-- Linux Times, October 2004.

chrisweekly 3 hours ago | parent | prev | next [-]

See also Gall's Law:

"All complex systems that work evolved from simpler systems that worked"

Izikiel43 3 hours ago | parent | prev [-]

What works at small scale possibly won't work at a huge scale.

skywhopper an hour ago | parent [-]

But what hasn’t even been tried at a small scale definitely won’t work at a huge scale.

dockd 5 minutes ago | parent | prev | next [-]

If it makes anyone feel better, it's not just software:

https://en.wikipedia.org/wiki/Auburn_Dam

https://en.wikipedia.org/wiki/Columbia_River_Crossing

If you're 97% over budget, are you successful? https://en.wikipedia.org/wiki/Big_Dig

BirAdam 7 hours ago | parent | prev | next [-]

I study and write quite a bit of tech history. IMHO from what I've learned over the last few years of this hobby, the primary issue is quite simple. While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning. Typically, software folks build new and every generation of software developers must relearn the same problems.

malfist 6 hours ago | parent | next [-]

I work at $FANG, every one of our org's big projects go off the rails at the end of the project and there's always a mad rush at the end to push developers to solve all the failures of project management in their off hours before the arbitrary deadline arrives.

After every single project, the org comes together to do a retrospective and ask "What can devs do differently next time to keep this from happening again". People leading the project take no action items, management doesn't hold themselves accountable at all, nor product for late changing requirements. And so, the cycle repeats next time.

I led and effort one time, after a big bug made it to production after one of those crunches that painted the picture of the root cause being a huge complicated project being handed off to offshore junior devs with no supervision, and then the junior devs managing it being completely switched twice in the 8 month project with no handover, nor introspection by leadership. My manager's manager killed the document and wouldn't allow publication until I removed any action items that would constrain management.

And thus, the cycle continues to repeat, balanced on the backs of developers.

ajkjk 4 hours ago | parent | next [-]

Of course the reason it works this way is that it works. As much as we'd like accountability to happen on the basis of principle, it actually happens on the basis of practicality. Either the engineers organize their power and demand a new relationship with management, or projects start going so poorly that necessity demands a better working relationship, or nothing changes. There is no 'things get better out from wisdom alone' option; the people who benefit from improvements have to force the hand of the people who can implement them. I don't know if this looks like a union or something else but my guess is that in large part it's something else, for instance a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.

I think the reasons this hasn't happened is (a) tech has moved too fast for anyone to actually be able to credibly say how things should be done for longer than a year or two, and (b) attempts at professional organizations borrowed too much from slower-moving physical engineering and so didn't adapt to (a). But I do think it can be done and would benefit the industry greatly (at the cost of slowing things down in the short term). It requires a very 'agile' sense of standards, though.. If standards mean imposing big constraints on development, nobody will pay attention to them.

johnnyanmac 3 hours ago | parent | next [-]

You forgot c) we're in a culture where people jump ship every 3-5 years. There's no incentive to learn from mistakes that you don't talk about at the next company, nor any care for the long term health of the current company.

>a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.

We have that as a form of IEEE, but it really doesn't come up much if you're not already neck deep in the organization.

jakub_g 26 minutes ago | parent [-]

> 3-5 years

That's maybe in Europe. Plenty of US developers those days have a litany of ~1-2 year stints at FAANGs and startups du jour in their CV.

malfist 3 hours ago | parent | prev [-]

I agree wholeheartedly that collective action is how we stop balancing poor management on the backs of engineers, but good luck getting other engineers to see it that way. There's heaps of propaganda out there telling engineers that if they join a union their high salary will go away, even though unions have never been shown to reduce wages.

ajkjk 3 hours ago | parent | next [-]

My hunch is that software engineers are averse to unions because they correctly perceive that unions are a wide angle away from the type of professional organization that would be most beneficial to them. The industry is sufficiently different that the normal union model is just not very good and has a 'square leg round hole' feeling.

For instance by and large the role of organizing to not to get more money but rather to reduce indignities... Wasted work, lack of forethought, bad management, arbitrary layoffs, etc. So it is much more about governing management with good practices than about keeping wages up; at least for now wages are generally high anyway.

there are also reasons to dedend jobs/wages in the face of e.g. outsourcing... But it's almost like a separate problem. Maybe there needs to be both a union and a uncoupled professional standard or something?

johnnyanmac 3 hours ago | parent [-]

what type of professional organization is most beneficial? Standards are already out there, but they need a union or government regulation to be enforced. Devs who want real change need to pick their medicine, or continue to let the industry stagnate.

>the role of organizing to not to get more money but rather to reduce indignities

agreed. And I think that's why it's going to really start taking hold as we enter year 4 of mass layoffs in the US (because outsourcing). Alongside overwork from the "survivors" and abusive PIPs to keep people on edge.

fugalfervor 2 hours ago | parent | next [-]

> year 4 of mass layoffs in the US (because outsourcing)

A lot of the layoffs appear to be about conserving cash for investment in AI. In many cases the jobs that are cut are not backfilled by workers in the US or abroad.

nradov 2 hours ago | parent | prev [-]

It's wild to claim that the industry is stagnating. By any objective measure the industry is larger, more influential, and more innovative than ever before. Perhaps the problems that people are complaining about here just don't matter very much?

franktankbank 2 hours ago | parent [-]

> By any objective measure the industry is larger, more influential, and more innovative than ever before

What objective measures would you use?

pluralmonad 2 hours ago | parent | prev | next [-]

I've worked places that refuse to fire low performers and its hard for it to not be toxic. I'm not saying this outcome is a forgone conclusion of unions, but my union experience is that poor performers take even longer to get rid of and I'm not sure I would be interested in that sort of environment again. This is more of an implementation problem than philosophical, but theoretically good and practically bad is still just bad.

johnnyanmac 3 hours ago | parent | prev | next [-]

Guess that's why gamedev is the one region where this is really starting to gain momentum. High salaries were already not a thing, and tend to mean nothing if you're laid off after 3 years of development for the release of a new game.

Though I think Gen Z in general will be making waves in the coming years. They can't even get a foot in the door, so why should they care about "high salaries"?

nitwit005 2 hours ago | parent | prev | next [-]

People aren't going to try to wrest control from management because some project is going off the rails. No one has any particular faith their coworkers will run anything better, and the pay checks show up regardless.

nradov 2 hours ago | parent | prev [-]

A union might help improve wages and working conditions in some organizations (although I personally wouldn't want one). But there is zero chance that a union could ever achieve widespread improvement in software architecture, methodologies, or project management. We don't have much consensus on the right way to do things, and what worked well in one circumstance often causes disaster in another.

lazyasciiart 3 hours ago | parent | prev | next [-]

For one project I got so far as to include in the project proposal some outcomes that showed whether or not it was a success: quote from the PM “if it doesn’t do that then we should not have bothered building this”. They objected to even including something so obviously required in the plan.

Waste of my bloody time. Project completed, taking twice as many devs for twice as long, great success, PM promoted. Doesn’t do that basic thing that was the entire point of it. Nobody has ever cared.

Edit to explain why I care: there was a very nice third party utility/helper for our users. We built our own version because “only we can do amazing direct integration with the actual service, which will make it far more useful”. Now we have to support our worse in-house tool, but we never did any amazing direct integration and I guarantee we never will.

SoftTalker 5 hours ago | parent | prev | next [-]

Glad to hear that $FANG has similar incompetency as every other mid-tier software shop I've ever worked in. Your project progression sounds like any of them. Here I was thinking that $FANG's highly-paid developers and project management processes were actually better than average.

jvanderbot 4 hours ago | parent [-]

They can afford to try a lot, why try better?

fishmicrowaver 4 hours ago | parent | prev | next [-]

Reminds me of the military. Senior leaders often have no real idea of what is happening on the ground because the information funneled upward doesn't fit into painting a rosy report. The middle officer ranks don't want to know the truth because it impacts their careers. How can executives even hope to lead their organizations this way?

ndiddy 3 hours ago | parent | next [-]

Well the US has lost every military conflict it's entered for the past 70 years. Since there's been no internal pressure to change methodology, maybe the US military doesn't view winning as necessary.

johnnyanmac 3 hours ago | parent | next [-]

Those past 70 years weren't about winning. It was about making sure the enemies lost more out of it. The US is large and relatively stable and hasn't had to face extended war on its soil since the Civil War 170 years ago. There's no true skin in the game for those who start these wars.

bergesenha 2 hours ago | parent | next [-]

Which is a good strategy, but do you think the afghans lost more than 2 trillion dollars?

hackandthink 2 hours ago | parent | prev [-]

"The war began on April 12, 1861, when the Confederacy bombarded Fort Sumter in South Carolina"

170 years ago is 1855.

baud147258 an hour ago | parent | prev [-]

> Well the US has lost every military conflict it's entered for the past 70 years.

Operation Just Cause? Desert Storm?

And, depending on how you look at it, the US won the war in Afghanistan and Irak, but lost the peace afterwards.

esafak 3 hours ago | parent | prev [-]

By not relying on direct reports for all their information.

Sevii 6 hours ago | parent | prev | next [-]

For how much power they have over team organization and processes, software middle management has nearly no accountability for outcomes.

AlotOfReading 5 hours ago | parent | next [-]

Is it middle management that has no accountability, or executive? Middle and line managers are nearly as targeted by layoff culling as ICs these days in FAANG. The broad processes they're passing down to ICs generally start with someone at director level or higher.

nothatraman 2 hours ago | parent [-]

In my experience it is the constant shifting of goal posts due to execs chasing the next shiny thing, or demanding a feature that they saw somewhere, or heard from client (singular, not plural)

darth_avocado 3 hours ago | parent | prev | next [-]

> For how much power they have over team organization and processes, software middle management has nearly no accountability for outcomes.

Can we also address the fact that “software spend” is distributed disproportionately to management at all levels and people who actually write the software are nickel and dimed. You’d save billions in spend and boost productivity massively if the management is bare bones and is held accountable like the rest of the folks.

jjtheblunt 3 hours ago | parent [-]

that's how the inner sanctum engineering in Apple worked, just like you proposed, at least from 15 years ago to within the last 10 years. i could have been in a very lucky time window to have had that luxury, but it had been an Apple mandate to not have deep hierarchies at least in engineering.

uriegas an hour ago | parent [-]

Maybe is because of what Steve Jobs mentioned about talented programmers having more power than CEOs as they can easily switch jobs.

MichaelZuo 5 hours ago | parent | prev [-]

The real question is why would smart competent people continue working under management with blatant ulterior motives that negatively affect them?

Why let their own credibility get dragged down for a second time, third time, fourth time, etc…?

The first time is understandable but not afterwards.

pixelpoet 5 hours ago | parent | next [-]

Astronomical salaries probably has something to do with it.

MichaelZuo 5 hours ago | parent [-]

Yeah that could convince smart competent people to grind their teeth and take a second chance under the same management.

But I don’t think a self respecting person would do that over and over.

raincom 4 hours ago | parent | next [-]

When people live in multi million dollar homes, self-respect doesn't pay monthly mortgage.

teeray 4 hours ago | parent | next [-]

So it's really not the astronomical salary, it's the astronomical debt.

johnnyanmac 3 hours ago | parent [-]

Yes and no. The compensation is a lot, but you're not necessarily able to just quit on a dime even if you live humbly. Interviewing takes weeks now and weeks more just to find a proper replacement. And salaries can fund you for months, bu t not years (let alone if you have a fammily)

I'll also say the obvious here in Sinclair's quote about salaries: you can indeed pay for someone's self respect.

MichaelZuo 2 hours ago | parent [-]

This would imply most of these types of positions are filled with less competent people willing to package and sell their self respect alongside their time?

(Thus commanding a rate similar to a more competent person who doesn’t package it to sell.)

mschuster91 4 hours ago | parent | prev [-]

Joke is, most of these homes aren't worth anywhere close to their paper value.

Cy Porter's home inspection videos... jeez. How these "builders" are still in business is mind-blowing to me (as a German). Here? Some of that shit he shows would lead to criminal charges for fraud.

raincom 3 hours ago | parent [-]

The land is worth more than the structure in these areas.

jrochkind1 3 hours ago | parent | prev | next [-]

You may be over-estimating how many people are self-respecting?

lazide 5 hours ago | parent | prev [-]

Depends on the paycheck.

People will do crazy things for just $100. Including literally get fucked in the ass by a stranger.

7 figures? Ho boy. They’ll use way fancier words though for that.

darth_avocado 3 hours ago | parent | prev | next [-]

In today’s market it’s mostly because of the lack of other options to earn a livelihood

zem 3 hours ago | parent | prev [-]

serious answer - you find a team with a good direct manager who handles all the upward interactions themselves, and then you basically work for that manager, rather than for the company.

game_the0ry 3 hours ago | parent | prev | next [-]

^ This. Not at FAANG, but I am too familiar with this.

This is why software projects fail. We lowly developers always take the blame and management skates. The lack of accountability among decision makers is why things like the UK Post Office scandals happen.

Heads need to be put on pikes. Start with John Roberts, Adam Crozier, Moya Greene, and Paula Vennells.

taeric 4 hours ago | parent | prev | next [-]

Did they go off the rails at the end, or deadlines force acknowledging that the project is not where folks want it to be?

That said, I think I would agree with your main concern, there. If they question is "why did the devs make it so that project management didn't work?" Seems silly not to acknowledge why/how project management should have seen the evidence earlier.

ludicrousdispla 5 hours ago | parent | prev | next [-]

I was a developer for a bioinformatics software startup in which the very essential 'data import' workflow wasn't defined until the release was in the 'testing' phase.

Koshkin 2 hours ago | parent | prev | next [-]

“I love deadlines. I love the whooshing noise they make as they go by.” ― Douglas Adams

franktankbank 6 hours ago | parent | prev [-]

> wouldn't allow publication until I removed any action items that would constrain management.

Thats what we call blameless culture lol

bane 6 hours ago | parent | prev | next [-]

I've also considered a side-effect of this. Each generation of software engineers learns to operate on top of the stack of tech that came before them. This becomes their new operating floor. The generations before, when faced with a problem, would have generally achieved a solution "lower" down in the stack (or at their present baseline). But the generations today and in the future will seek to solve the problems they face on top of that base floor because they simply don't understand it.

This leads to higher and higher towers of abstraction that eat up resources while providing little more functionality than if it was solved lower down. This has been further enabled by a long history of rapidly increasing compute capability and vastly increasing memory and storage sizes. Because they are only interacting with these older parts of their systems at the interface level they often don't know that problems were solved years prior, or are capable of being solved efficiently.

I'm starting to see ideas that will probably form into entire pieces of software "written" on top of AI models as the new floor. Where the model basically handles all of the mainline computation, control flow, and business logic. What would have required a dozen Mhz and 4MB of RAM to run now requires TFlops and Gigabytes -- and being built from a fresh start again will fail to learn from any of the lessons learned when it was done 30 years ago and 30 layers down.

seeknotfind 5 hours ago | parent [-]

Yeah, people tend to add rather than improve. It's possible to add into lower levels without breaking things, but it's hard. Growing up as a programmer, I was taught UNUX philosophy as a golden rule, but there are sharp corners on this one:

To do a new job, build afresh rather than complicate old programs by adding new "features".

RaftPeople 5 hours ago | parent | prev | next [-]

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not

I've been managing, designing, building and implementing ERP type software for a long time and in my opinion the issue is typically not the software or tools.

The primary issue I see is lack of qualified people managing large/complex projects because it's a rare skill. To be successful requires lots of experience and the right personality (i.e. low ego, not a person that just enjoys being in charge but rather a problem solver that is constantly seeking a better understanding).

People without the proper experience won't see the landscape in front of them. They will see a nice little walking trail over some hilly terrain that extends for about a few miles.

In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.

avemg 4 hours ago | parent [-]

> In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.

And boy to the people making the decisions NOT want to hear that. You'll be dismissed as a naysayer being overly conservative. If you're in a position where your words have credibility in the org, then you'll constantly be asked "what can we do to make this NOT a quest to the top of Mt Doom?" when the answer is almost always "very little".

Wololooo 4 hours ago | parent | next [-]

Impossible projects with impossible deadlines seems to be the norm and even when people pull them off miraculously the lesson learned is not "ok worked this time for some reason but we should not do this again", then the next people get in and go "it was done in the past why can't we do this?"

RaftPeople 3 hours ago | parent | prev | next [-]

> And boy to the people making the decisions NOT want to hear that.

You are 100% correct. The way I've tried to manage that is to provide info while not appearing to be the naysayer by giving some options. It makes it seem like I'm on board with crazy-ass plan and just trying to find a way to make it successful, like this:

"Ok, there are a few ways we could handle this:

Option 1 is to do ABC first which will take X amount of time and you get some value soon, then come back and do DEF later

Option 2 is to do ABC+DEF at the same time but it's much tougher and slower"

marcosdumay an hour ago | parent | prev [-]

My favorite fact is that every single time an organization manages to make a functional development team that can repeatedly successfully navigate all the problems and deliver working software that adds real value, the high up decision makers always decide to scale the team next.

Working teams are good for a project only, then they are destroyed.

hackthemack 6 hours ago | parent | prev | next [-]

I have a theory that the churn in technology is by design. If a new paradigm, new language, new framework comes out every so many years, it allows the tech sector to always want to hire new graduates for lower salaries. It gives a thin veneer of we want to always hire the person who has X when really they just do not want to hire someone with 10 years of experience in tech but who may not have picked up X yet.

I do not think it is the only reason. The world is complex, but I do think it factors into why software is not treated like other engineering fields.

jemmyw 4 hours ago | parent | next [-]

The problem with that is that it would require a huge amount of coordination for it to be by design. I think it's better to look on it as systemic. Which isn't to say there aren't malign forces contributing.

hackthemack 3 hours ago | parent | next [-]

I agree. Perhaps, "by design" is not the correct phrasing. Many decisions and effects go through a multi weighted graph of complexity (sort of like machine learning).

tra3 4 hours ago | parent | prev [-]

Indeed. How does that saying go? Don’t attribute to malice what can be explained by stupidity?

On the other hand Microsoft and taceboook did collude to keep salaries low. So who knows.

hackthemack 3 hours ago | parent [-]

Anyone in tech should read up on https://en.wikipedia.org/wiki/High-Tech_Employee_Antitrust_L...

It was more tech companies in collusion than many people realize. 1) Apple and Google, (2) Apple and Adobe, (3) Apple and Pixar, (4) Google and Intel, (5) Google and Intuit, and (6) Lucasfilm and Pixar.

It was settled out of court. One of the plaintiffs was very vocal that the settlement was a travesty of justice. The companies paid less in the settlement than the amount they saved by colluding to keep wages down.

https://www.mercurynews.com/2014/06/19/judge-questions-settl...

SoftTalker 4 hours ago | parent | prev [-]

Constantly rewriting the same stuff in endless cycles of new frameworks and languages gives an artificial sense of productivity and justifies its own existence.

If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.

pietervdvn 3 hours ago | parent | next [-]

We do take down a lot of old buildings (or renovate them thoroughly) cause the old buildings contain asbestos, are not properly isolated, ...

Hemospectrum 3 hours ago | parent | prev | next [-]

> If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.

This is almost exactly what happens in some countries.

bdangubic 3 hours ago | parent [-]

which one(s)?

Gigachad 2 hours ago | parent [-]

Pretty common in Australia. Theres heritage laws to try to prevent replacing all the old buildings, but often they are so undesirable the owner just leaves them vacant until trespassers manage to burn it down.

hackthemack 3 hours ago | parent | prev [-]

I agree. But, I think the execs just say, "How can we get the most bang for our buck? If we use X, Y, Z technologies, that are the new hotness, then we will get all the new hordes of hires out there, which will make them happy, and has the added benefit of paying them less"

QuercusMax 5 hours ago | parent | prev | next [-]

I think part of it is that reading code isn't a skill that most people are taught.

When I was in grad school ages ago, my advisor told me to spend a week reading the source code of the system we were working with (TinyOS), and come back to him when I thought I understood enough to make changes and improvements. I also had a copy of the Linux Core Kernel with Commentary that I perused from time to time.

Being able to dive into an unknown codebase and make sense of where the pieces are put together is a very useful skill that too many people just don't have.

spit2wind 11 minutes ago | parent | next [-]

I'm curious, what does "read code" mean to you? What does that skill look like and how is it taught?

jsrcout an hour ago | parent | prev | next [-]

Reading (someone else's) code is a whole lot harder than writing it. Which is unfortunate because I do an awful lot of it at work.

gorbachev 2 hours ago | parent | prev [-]

Being good at reading code isn't a skill that helps large software projects stay on rails.

It's more about being good at juggling 1000 balls at the same time. It's 99.9% of the time a management problem, not a software problem.

alangibson 5 hours ago | parent | prev | next [-]

"While hardware folks study and learn from the successes and failures of past hardware, software folks do not." Couldn't be further from the truth. Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult (see microservices for example).

Once you've worked in both hardware and software engineering you quickly realize that they only superficially similar. Software is fundamentally philosophy, not physics.

Hardware is constrained by real world limitations. Software isn't except in the most extreme cases. Result is that there is not a 'right' way to do any one thing that everyone can converge on. The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.

jemmyw 4 hours ago | parent | next [-]

Software doesn't operate in some magical realm outside of the physical world. It very much is constrained by real world limitations. It runs on the hardware that itself is limited. I wonder if some failures are a result of thinking it doesn't have these limitations?

moritz 2 hours ago | parent | next [-]

As the great Joe Armstrong used to say, “a lot of systems actually break the laws of physics”[1] — don’t program against the laws of physics.

> In distributed systems there is no real shared state (imagine one machine in the USA another in Sweden) where is the shared state? In the middle of the Atlantic? - shared state breaks laws of physics. State changes are propagated at the speed of light - we always know how things were at a remote site not how they are now. What we know is what they last told us. If you make a software abstraction that ignores this fact you’ll be in trouble.[2]

[1]: “The Mess We’re In”, 2014 https://www.youtube.com/watch?v=lKXe3HUG2l4

[2]: https://news.ycombinator.com/item?id=19708900

SoKamil 3 hours ago | parent | prev [-]

> It very much is constrained by real world limitations. It runs on the hardware that itself is limited

And yet we scale the shit out of it, shifting limitations further and further. On that scale different problems emerge and there is no single person or even single team that could comprehend this complexity in isolation. You start to encounter problems that have never been solved before.

IshKebab an hour ago | parent | prev | next [-]

I disagree. At least at the RTL level they're very similar. You don't really deal with physics there, except for timing (which is fairly analogous with software performance things like hard real-time constraints).

> Result is that there is not a 'right' way to do any one thing that everyone can converge on.

Are you trying to say there is in hardware? That must be why we have exactly one branch predictor design, lol

> The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.

"The first function call looks a whole lot like a function call today..."

Sharlin 5 hours ago | parent | prev [-]

What you and the GP said are not mutually exclusive. Software engineers are quick to drink every new Kool-Aid out there, which is exactly why we’re so damned blind to history and lessons learned before.

nitwit005 4 hours ago | parent | prev | next [-]

Most of the time, there's no need to study anything. Any experienced software engineer can tell you about a project they worked on with no real requirements, management constantly changing their mind, etc.

mbesto 5 hours ago | parent | prev | next [-]

I would boil this down to something else, but possibly related: project requirements are hard. That's it.

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning.

For most IT projects, software folks generally can NOT "pull apart" old systems, even if they wanted to.

> Typically, software folks build new and every generation of software developers must relearn the same problems.

Project management has gotten way better today than it was 20 years, so there is definitely some learnings that have been passed on.

rawgabbit 33 minutes ago | parent [-]

A CIO once told me with Agile we didn’t need requirements. He thought my suggestion to document the current system before modifying was a complete waste of time. Instead he made all the developers go through a customer service workshop, how to handle and communicate with customers. Cough cough… most developers do not talk with customers. Instead where we worked developers took orders from product and project people whose titles changed every year but they operated with the mindset of a drill sergeant. My way or the highway.

ctkhn 5 hours ago | parent | prev | next [-]

In my experience, a lot of the time the people who COULD be solving these issues are people who used to code or never have. The actual engineers who might do something like this aren't given authority or scope and you have MBAs or scrum masters in the way of actually solving problems.

raincom 4 hours ago | parent | prev | next [-]

Some consequences of NOT learning from prior successes and failures: (a) no more training for the next generation of developers/engineers (b) fighting for the best developers, and this manifests in leetcode grinding (c) decrease in cooperation among team mates, etc.

smj-edison 3 hours ago | parent | prev | next [-]

As someone who's learning programming right now, do you have any suggestions on how one would go about finding and studying these successes and failures?

keeda 3 hours ago | parent | prev | next [-]

I think there is a ton more nuance, but can still be explained by a simple observation, which TFA hints at: "It's the economics, stupid."

Engineering is the intersection of applied sciences, economics and business. The economics aspect is almost never recognized and explains many things. Projects of other disciplines have significantly higher costs and risks, which is why they require a lot more rigor. Taking hardware as example, one bad design decision can sink the entire company.

On the other hand, software has economics that span a much more diverse range than any other field. Consider:

- The capital costs are extremely low.

- Development can be extremely fast at the task level.

- Software, once produced, can be scaled almost limitlessly for very cheap almost instantly.

- The technology moves extremely fast. Most other engineering disciplines have not fundamentally changed in decades.

- The technology is infinitely flexible. Software for one thing can very easily be extended for an adjacent business need.

- The risks are often very low, but can be very high at the upper end. The rigor applied scales accordingly. Your LoB CRUD app going down might bother a handful of people, so who cares about tests? But your flight control software better be (and is) tested to hell and back.

- Projects vary drastically in stacks, scopes and risk profiles, but the talent pool is more or less common. This makes engineering culture absolutely critical because hiring is such a crapshoot.

- Extreme flexibility also masks the fact that complexity compounds very quickly. Abstractions enable elegant higher-level designs, but they mask internal details that almost always leak and introduce minor issues that cause compounding complexity.

- The business rules that software automates are extremely messy to begin with (80K payroll rules!) However, the combination of a) flexibility, b) speed, and c) scalability engender a false sense of confidence. Often no attempt is made at all to simplify business requirements, which is probably where the biggest wins hide. This is also what enables requirements to shift all the time, a prime cause for failures.

Worse, technical and business complexity can compound. E.g. its very easy to think "80K payroll rules linearly means O(80K) software modules" and not "wait, maybe those 80K payroll rules interact with each other, so it's probably a super-linear growth in complexity." Your architecture is then oriented towards the simplistic view, and needs hacks when business reality inevitably hits, which then start compounding complexity in the codebase.

And of course, if that's a contract up for bidding, your bid is going to be unsustainably low, which will be further depressed by the competitive bidding process.

If the true costs of a project -- which include human costs to the end users -- are not correctly evaluated, the design and rigor applied will be correspondingly out of whack.

As such I think most failures, in addition to regular old human issues like corruption, can be attributed to an insufficient appreciation of the economics involved, driven primarily by overindexing on the powers of software without an appreciation of the pitfalls.

pphysch 6 hours ago | parent | prev | next [-]

There are rational explanations for this. When software fails catastrophically, people almost never die (considering how much software crashes every day). When hardware fails catastrophically, people tend to die, or lose a lot of money.

There's also the complexity gap. I don't think giving someone access to the Internet Explorer codebase is necessarily going to help them build a better browser. With millions of moving parts it's impossible to tell what is essential, superfluous, high quality, low quality. Fully understanding that prior art would be a years long endeavor, with many insights no doubt, but dubious.

MarcelOlsz 4 hours ago | parent | prev | next [-]

I think this is a downstream of effect of there being no real regulation or professional designations in software which leads to every company and team being wildly different leading to no standards leaving no time for anything but crunching since there are no barriers restricting your time, so nobody spends time doing much besides shipping constantly.

wesammikhail 6 hours ago | parent | prev | next [-]

Agree 100%.

I know a lot of people on here will disagree with me saying this but this is exactly how you get an ecosystem like javascript being as fragmented, insecure, and "trend prone" as the old school Wordpress days. It's the same problems over and over and every new "generation" of programmers has to relearn the lessons of old.

Salgat 5 hours ago | parent [-]

The difficulty lies in the fact that most software is quite cheap to generate very complex designs compared to hardware. For software designs treated similarly to hardware (such as in medical devices or at NASA), you do gain back those benefits at great expense.

tristor 7 hours ago | parent | prev | next [-]

This is one part of the issue. The other major piece of this that I've seen over more than two decades in industry is that most large projects are started by and run by (but not necessarily the same person) non-technical people who are exercising political power, rather than by technical people who can achieve the desired outcomes. When you put the nexus of power into the hands of non-technical people in a technical endeavor you end up with outcomes that don't match expectations. Larger scale projects deeply suffering from "not knowing what we don't know" at the top.

mbesto 5 hours ago | parent | next [-]

If this were true all of the time then the fix would be simple - only have technical people in charge. My experience has shown that this (only technical people in charge) doesn't solve the problem.

tristor 3 hours ago | parent | next [-]

Success pretty much requires putting technical people in charge, but that doesn't mean putting technical people in charge is sufficient for success to happen. We have plenty of data over the last 40 years to prove my case. Furthermore, unfortunately, what it means to be a "technical person" is not so simple to define, unfortunately as the easy ways to codify it often exclude the very people who you want involved.

Suffice to say, projects are significantly more likely to succeed when the power in the project is held by people who are competent /and/ understand the systems they are working with /and/ understand the problem domain you are developing a solution in. Whether or not they have a title like "engineer" or have a technical degree, or whatever other hallmark you might choose is largely irrelevant. What matters is competency and understanding, and then ultimately accountability.

Most large projects I've been a part of or near lacked all three of these things, and thus were fundamentally doomed to failure before they ever began. The people in power lacked competency and understanding, the entire project team and the people in power lacked accountability, and competency was unevenly distributed amongst the project team.

It may feel pithy, but it really is true that in many large projects the fundamental issue that leads to failure is that the decision makers don't know what they're doing and most of the implementers are incompetent. We can always root cause further to identify the incentive structures in society, and particularly in public/government projects that lead to this being true, but the fact remains at the project level this is the largest problem in my observation.

fragmede 4 hours ago | parent | prev | next [-]

If people didn’t work, maybe we should put an LLM in charge instead.

chileRick 3 hours ago | parent | prev [-]

Boeing has entered the chat

smokel 2 hours ago | parent | prev | next [-]

I'm not entirely sure what you mean with "technical people" but it seems that you may not appreciate the problems that "non-technical people" try to tackle.

Do your two decades of experience cover both sides?

tristor an hour ago | parent [-]

> Do your two decades of experience cover both sides?

Yes.

I appreciate both sides and have a wealth of experience in both. The challenge is that all the non-technical problems cannot be solved successfully while lacking a technical understanding. Projects generally don't fail for technical reasons, they fail because they were not set up for success, and that starts with having a clear understanding of requirements, feasibility, and a solid understanding of both the current state and the path to reach your desired outcomes, both politically/financially and technically.

I was an engineer for more than a decade, I've been in Product for nearly a decade, and I'm now a senior manager in Product. I can honestly say that I have the necessary experience to hold strong opinions here and to be correct in those opinions.

You need technical people who can also handle some of the non-technical aspects of a project with the reins of power if you want the project to succeed, otherwise it is doomed by the lack of understanding and competency of those in charge.

cjbgkagh 6 hours ago | parent | prev [-]

Sometimes giving people what they want can be bad for them; management wants cheap compliant workers, management gets cheap compliant workers, and then the projects fall apart in easily predictable and preventable ways.

Because such failures are so common management typically isn’t punished when they do so it’s hard to keep interests inline. And because many producers are run on a cost plus basis there can be a perverse incentive to do a bad job, or at least avoid doing a good one.

mstipetic 6 hours ago | parent | prev | next [-]

I was so annoyed when I found out the OTP library and realized we’ve been reinventing things for 20+ years

begueradj 3 hours ago | parent | prev | next [-]

Indeed.

That's why we see every now and then "new" programming paradigms which were once obsolete.

jcelerier 5 hours ago | parent | prev | next [-]

... are you saying that hardware projects fail less than software ones? just building a bridge is something that fails on a regular occurence all over the world. Every chip comes with list of erratas longer than my arm.

01100011 3 hours ago | parent | prev [-]

Software folks treat their output as if it's their baby or their art projects.

Hardware folks just follow best practices and physics.

They're different problem spaces though, and having done both I think HW is much simpler and easier to get right. SW is often similar if you're working on a driver or some low-level piece of code. I tried to stay in systems software throughout my career for this reason. I like doing things 'right' and don't have much need to prove to anyone how clever I am.

I've met many SW folks who insist on thinking of themselves as rock stars. I don't think I've ever met a HW engineer with that attitude.

esafak 2 hours ago | parent | next [-]

Because the software market is bigger and more competitive; hardware is mature.

__mharrison__ 2 hours ago | parent | prev [-]

What are the silver bullets... I mean, best practices that keep getting ignored?

neilv 5 hours ago | parent | prev | next [-]

On some of the infamous large public IT project failures, you just have to look at who gets the contract, how they work, and what their incentives are. (For example, don't hire management consulting partner smooth talkers, and their fleet of low-skilled seat-warmers, to do performative hours billing.)

It's also hard when the team actually cares, but there are skills you can learn. Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).

But once you're a bit comfortable with the art and science of those, big new challenges are more about political and environment reality. It comes down to alignment and competence of: workers, internal team leadership, partners/vendors, customers, and investors/execs.

Discussing this is a little awkward, but maybe start with alignment, since most of the competence challenges are rooted in mis-alignments: never developing nor selecting for the skills that alignment would require.

cheesecompiler an hour ago | parent | next [-]

Right, it's largely politically and ego driven; a people not a software problem.

JBlue42 3 hours ago | parent | prev [-]

> Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).

Was there any literature or other findings that you came across that ended up clicking and working for you that you can recommend to us?

neilv 2 hours ago | parent [-]

I could blather for hours around this space. A few random highlights:

* The very first thing I read about requirements was Weinberg, and it's still worth reading. (Even if you are a contracting house, with a hopeless client, and you want to go full reactive scrum participatory design, to unblock you for sprints with big blocks of billable hours, not caring how much unnecessary work you do... at least you will know what you're not doing.)

* When interviewing people about business or technical, learn to use a Data Flow Diagram. You can make it accessible to almost everyone, as you talk through it, and answer all sorts of questions, at a variety of levels. There are a bunch of other system modeling tools you can use, at times, but do not underestimate the usefulness and accessibility of a good DFD.

* If you can (or have to) plan at all, find and learn to use a serious Gantt chart centric planning tool (work breakdown, dependencies, resource allocations, milestones), and keep it up to date (which probably includes having it linked with whatever task-tracking tool you use, but you'll usually also be changing it for bigger-picture reasons too). Even if you are a hardware company, with some hard external-dependency milestones, you will be changing things around those unmoveables. And have everyone work from the same source of truth (everyone can see the same Gantt chart and the task

* Also learn some kind of Kanban-ish board for tasking, and have it be an alternative view on the same data that's behind the Gantt view and the tasks/issues that people can/should/are working on at the moment, and anything immediately getting blocked.

* In a rare disruptive startup emergency, know when to put aside Gantt, and fall back to an ad hoc text file or spreadsheet of chaos-handling prioritization that's changing multiple times per day. (But don't say that your startup is always in emergency mode and you can never plan anything, because usually there is time for a Kanban board, and usually you should all share an understanding of how those tasks fit into a larger plan, and trace back to your goals, even if it's exploratory or reactive.)

* Culture of communicating and documenting, in low-friction, high-value, accessible ways. Respect it as team-oriented and professional

* Avoid routine meetings; make it easy to get timely answers and discussion, as soon as possible. This includes reconsidering how accessible upper leadership should be: can you get closer to being responsive to the needs of the work on the project (e.g., if anyone needs a decision from the director/VP/etc., then quickly prep and ask, maybe with an async message, but don't wait for weekly status meeting or to schedule time on their calendar).

* Avoid unnecessary process. Avoid performances.

* People need blocks of time when they can get flow. Sometimes for plowing through a big chunk of stuff that only requires basic competence, and sometimes when harder thinking is required.

* Be very careful with individual performance metrics. Ideally you can incentive everyone to be aligned towards team success, through monetary incentives (e.g., real equity for which they can affect the value) and through culture (everyone around you seems to work as a team, and you like that, and that inspires you). I would even start by asking if we can compensate everyone equally, shun titles, etc., and how close can we practically get to that.

* Be honest about resume-driven-development. It doesn't have to be a secret misalignment. Don't let it be motivated solely as a secret goal of job-hoppers that is then lied about, or it will probably be to the detriment of your company (and also, that person will job-hop, fleeing the mess they made). If you're going to use new resume keyword framework for a project, the whole team should be honest that, say, there's elements of wanting to potentially get some win from it, wanting to trial it for possible greater use and build up organizational expertise in it, and also that it's a very conscious and honest perk for the workers to get to use the new toy.

* Infosec is an unholy dumpster fire, throughout almost the entire field. Decide if you want to do better, and if so, then back it up with real changes, not CYA theatre and what someone is trying to sell you.

* LeetCode frat pledging interviews select for so much misaligned thinking, and also signals that you are probably just more of the same as the low bar of our field, and people shouldn't take you seriously when you try to tell them you want to do things better.

* Nothing will work well if people aren't aligned and honest.

twidledee 7 minutes ago | parent | prev | next [-]

The difference between success and failure of large projects comes down to technical leadership. I've seen it time and time again. Projects that are managed by external consulting companies (name brand or otherwise) have a very poor track record of delivering. An in-house technical lead that is committed to the success of the project will always do better. And yes, this technical lead must have the authority to limit the scope of the system rewrite. Endless scope creep is a recipe for failure. Outside consulting firms will never say "No" to any new request - it means more business for them - their goals are not aligned with the client.

ChrisMarshallNY 5 hours ago | parent | prev | next [-]

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not.

I guess that’s the real problem I have with SV’s endemic ageism.

I was personally offended, when I encountered it, myself, but that’s long past.

I just find it offensive, that experience is ignored, or even shunned.

I started in hardware, and we all had a reverence for our legacy. It did not prevent us from pursuing new/shiny, but we never ignored the lessons of the past.

pork98 5 hours ago | parent [-]

Why do you find it offensive? It’s not personal. Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right? Even evolution shuns experience, all but throwing most of it out each generation, with a scant few species as exceptions.

Bjartr 4 hours ago | parent | next [-]

> Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right?

Not at all. The mistake to learn from in Webvan's case was expanding too quickly and investing in expensive infrastructure all before achieving product-market fit. Not that they delivered groceries.

pkilgore 5 hours ago | parent | prev | next [-]

I think you're mistaking the funding and starting of companies with the execution of their vision through software engineering -- the entire point of the article, and the OP.

antonvs 2 hours ago | parent | prev [-]

This is a classic straw man argument, which depends on the assumption that all people of a certain age would think a certain way.

Also, your understanding of evolution is incorrect. All species on Earth are the results of an enormous amount of accumulated "experience", over periods of up to billions of years. Even the bacteria we have today took hundreds of millions of years to reach anything similar to their current form.

0xbadcafebee 5 hours ago | parent | prev | next [-]

Software projects fail because humans fail. Humans are the drivers of everything in our world. All government, business, culture, etc... it's all just humans. You can have a perfect "process" or "tool" to do a thing, but if the human using it sucks, the result will suck. This means that the people involved are what determines if the thing will succeed or fail. So you have to have the best people, with the best motivations, to have a chance for success.

The only thing that seems to change this is consequences. Take a random person and just ask them to do something, and whether they do it or not is just based on what they personally want. But when there's a law that tells them to do it, and enforcement of consequences if they don't, suddenly that random person is doing what they're supposed to. A motivation to do the right thing. It's still not a guarantee, but more often than not they'll work to avoid the consequences.

Therefore if you want software projects to stop failing, create laws that enforce doing the things in the project to ensure it succeeds. Create consequences big enough that people will actually do what's necessary. Like a law, that says how to build a thing to ensure it works, and how to test it, and then an independent inspection to ensure it was done right. Do that throughout the process, and impose some kind of consequence if those things aren't done. (the more responsibility, the bigger the consequence, so there's motivation commensurate with impact)

That's how we manage other large-scale physical projects. Of course those aren't guaranteed to work; large-scale public works projects often go over-budget and over-time. But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process to encourage humans to do the right thing.

farrelle25 3 hours ago | parent | next [-]

> Software projects fail because humans fail. Humans are the drivers of everything in our world.

Ah finally - I've had to scroll halfway down to find a key reason big software projects fail.

<rant>

I started programming in 1990 with PL/1 on IBM mainframes and for 35 years have dipped in and out of the software world. Every project I've seen fail was mainly down to people - egos, clashes, laziness, disinterest, inability to interact with end users, rudeness, lack of motivation, toxic team culture etc etc. It was rarely (never?) a major technical hurdle that scuppered a project. It was people and personalities, clashes and confusion.

</rant>

Of course the converse is also true - big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect. Of course, most of these projects were bland corporate business data ones... so not technically very challenging. But still big enough software projects.

Gez... don't know why I'm getting so emotional (!) But the hard-core sofware engineering world is all about people at the end of the day.

treespace8 3 hours ago | parent [-]

> big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect.

I completely agree. I would just like to add that this only works where the inspired leaders are properly incentivized!

beezlebroxxxxxx 5 hours ago | parent | prev [-]

If software engineers want to be referred to as "engineers" then they should actually learn about engineering failures. The industry and educational pipeline (formal and informal) as a whole is far more invested in butterfly chasing. It's immature in the sense that many people with decades of experience are unwilling to adopt many proven practices in large scale engineering projects because they "get in the way" and because they hold them accountable.

ThaDood 6 hours ago | parent | prev | next [-]

So, I'm not a dev nor a project manager but I found this article very enlightening. At the risk of asking a stupid question and getting a RTFM or a LMGTFY can anyone provide any simple and practical examples of software successes at a big scale. I work at a hospital so healthcare specific would be ideal but I'll take anything.

FWIW I have read The Phoenix Project and it did help me get a better understanding of "Agile" and the DevOps mindset but since it's not something I apply in my work routinely it's hard to keep it fresh.

My goal is to try and install seeds of success in the small projects I work on and eventually ask questions to get people to think in a similar perspective.

BenoitEssiambre 6 hours ago | parent | next [-]

Unix and Linux would be your quintessential examples.

Unix was an effort to take Multics, an operating system that had gotten too modular, and integrate the good parts into a more unified whole (book recommendation: https://www.amazon.com/UNIX-History-Memoir-Brian-Kernighan/d...).

Even though there were some benefits to the modularity of Multics (apparently you could unload and replace hardware in Multics servers without reboot, which was unheard of at the time), it was also its downfall. Multics was eventually deemed over-engineered and too difficult to work with. It couldn't evolve fast enough with the changing technological landscape. Bell Labs' conclusion after the project was shelved was that OSs were too costly and too difficult to design. They told engineers that no one should work on OSs.

Ken Thompson wanted a modern OS so he disregarded these instructions. He used some of the expertise he gained while working on Multics and wrote Unix for himself (in three weeks, in assembly). People started looking over Thompson's shoulder being like "Hey what OS are you using there, can I get a copy?" and the rest is history.

Brian Kernighan described Unix as "one of" whatever Multics was "multiple of". Linux eventually adopted a similar architecture.

More here: https://benoitessiambre.com/integration.html

prmph 5 hours ago | parent [-]

Are you equating success with adoption or use? I would say there are lot's of software that are widely used but are a mess.

What would be a competitor to linux that is also FOSS? If there's none, how do you assess the success or otherwise of Linux?

Assume Linux did not succeed but was adopted, how would that scenario look like? Is the current situation with it different from that?

gishh 3 hours ago | parent [-]

> What would be a competitor to linux that is also FOSS? If there's none, how do you assess the success or otherwise of Linux?

*BSD?

As for large, successful open source software: GCC? LLVM?

hi_hi 3 hours ago | parent | prev | next [-]

This is a noble and ambitious goal. I feel qualified to provide some pointers, not because I have been instrumental in delivering hugely successful projects, but because I have been involved, in various ways, in many, many failed projects. Take what you will from that :-)

- Define "success" early on. This usually doesn't mean meeting a deadline on time and budget. That is actually the start of the real goal. The real success should be determined months or years later, once the software and processes have been used in a production business environment.

- Pay attention to Conways Law. Fight this at your peril.

- Beware of the risk of key people. This means if there is a single person who knows everything, you have a risk if they leave or get sick. Redundancy needs to be built into the team, not just the hardware/architecture.

- No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.

- Be prepared to say "no", alot. (This is probably the most important one, and the hardest.)

- Define ownership early. If no one is clearly responsible for the key deliverables, you are doomed.

- Consider the human aspect as equally as the technical. People don't like change. You will be introducing alot of change. Balancing this needs to be built into the project at every stage.

- Plan for the worst, hope for the best. Don't assume things will work the way you want them to. Test _everything_, always.

[Edit. Adding some items.]

johnnyanmac 3 hours ago | parent [-]

>No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.

As a Californian, I hate this mentality so much. Why can't we just have a smooth release with minimal drama because we planned well? Maybe we could properly fix some tech debt or even polish up some features if we're not spending the last 2 months crunching on some showstopper that was pointed out a year ago.

shagmin 5 hours ago | parent | prev | next [-]

I find it kind of hard to define success or failure. Google search and Facebook are a success right? And they were able to scale up as needed, which can be hard. But the way they started is very different from a government agency or massive corporation trying to orchestrate it from scratch. I don't know if you'd be familiar with this, but maybe healthcare.gov is a good example... it was notoriously buggy, but after some time and a lot of intense pressure it was dealt with.

fragmede 5 hours ago | parent [-]

The untold story is of landing software projects at Google. Google has landed countless software projects internally in order for Google.com to continue working, and the story of those will never reach the light of day, except in back room conversations never to be shared publicly. How did they go from internal platform product version one to version two? it's an amazing feat of engineering that can't be shown to the public, which is a loss for humanity, honestly, but capitalism isn't going to have it any other way.

SoftTalker 4 hours ago | parent [-]

Are you saying this from firsthand experience? Because it sounds like the sort of myth that Google would like you to believe. Much more believable is that their process is as broken and chaotic as most software projects are, they are just so big that they manage to have some successes regardless. Survivorship bias. A broken clock is still right twice a day.

johnnyanmac 3 hours ago | parent | next [-]

That's my entire industry, so I can believe it. I'd love to learn large scale game architecture but it simply isn't public. At best you can dig into the source available 30 year legacy code of Unreal Engine as a base. But extracting architecture from the source is like looking at a building without a schematic.

Your best bet is a 500 dollar GDC vault that offers relative scraps of a schematic and making your own from those experiences.

fragmede 4 hours ago | parent | prev [-]

I was an SRE on their Internet traffic team for three years, from 2020 til 2023. The move from Sisyphus to Legislator is something I wish the world could see documented in a museum, like the moving of the Cape Hatteras Lighthouse.

solatic 3 hours ago | parent | prev | next [-]

India's UPI (digital payments) is almost as big a scale as it gets, and it's pretty universally considered a success: https://en.wikipedia.org/wiki/Unified_Payments_Interface

spit2wind 3 hours ago | parent | prev [-]

I heard Direct File was pretty successful. Something like a 94% reported it as a positive experience.

namegulf 6 minutes ago | parent | prev | next [-]

[delayed]

mdavid626 7 hours ago | parent | prev | next [-]

It’s so “nice” to know, that trillions spent on AI not only won’t make this better, but it’ll make it significantly worse.

keeda 4 hours ago | parent | next [-]

Not really, by most indications AI seems to be an amplifier more than anything else. If you have strong discipline and quality control processes it amplifies your throughput, but if you don't, it amplifies your problems. (E.g. see the DORA 2025 report.)

So basically things will still go where they were always going to go, just a lot faster. That's not necessarily a bad thing.

johnnyanmac 2 hours ago | parent | next [-]

>If you have strong discipline and quality control processes

you're placing a lot of faith on this if-statement. in an article pretty much say that we in fact lack strong discipline and quality control.

keeda 17 minutes ago | parent [-]

I meant it more as an observation than an optimistic prediction, really :-)

The article is sound, but it's focus on large public failures disregards the vast, vast, vast majority of the universe of software projects that nobody really thinks about, because they mostly just work -- websites and mobile apps and games and internal LoB CRUD apps and cloud services and the huge ecosystem of open source projects and enterprise and hobby software.

Without some consideration of that, we cannot really generalize this article to reflect the "success rate" of our industry.

That said, I think the acceleration introduced by AI is overall a "Good Thing (tm)" simply because, all else being equal, it's generally better to fail faster rather than later.

mdavid626 4 hours ago | parent | prev [-]

Yes, AI can help, but it won’t. That’s my point.

In practice, it will make people even less care or pay attention. These big disasters will be written by people without any skills using AI.

keeda 3 hours ago | parent [-]

But my point wasn't about AI helping or not, my point was AI will simply accelerate the natural trajectory of your organization.

This is not a hypothetical, this is based on reports using large-scale data like DORA and DX: https://blog.robbowley.net/2025/11/05/findings-from-dxs-2025...

Edited to add: To clarify, I meant that if an organization was going to deliver a billion-dollar boondoggle of a project, AI will not change that outcome, but it WILL help deliver that faster. Which is why I meant it's not necessarily a bad thing, because as in software, it's generally better to fail faster.

fransje26 5 hours ago | parent | prev [-]

"Worse" won't even start to describe the economical crisis we will be in once the bubble bursts.

And although that, in itself, should be scary enough, it is nothing compared to the political tsunami and unrest it will bring in its wake.

Most of the Western world is already on shaky political ground, flirting with the extreme-right. The US is even worse, with a pathologically incompetent administration of sociopaths, fully incapable of coming up with the measures necessary to slow down the train of doom careening out of control towards the proverbial cliff of societal collapse.

If the societal tensions are already close to breaking point now, in a period of relative economical prosperity, I cannot start to imagine what they will be like once the next financial crash hits. Especially one in the multi trillion of dollars.

They say that humanity progresses through episodes of turmoil and crisis. Now that we literally have all the knowledge of the world at our fingertips, maybe it is time to progress past this inadequate primeval advancement mechanism, and to truly enter an enlightened age where progress is made from understanding, instead of crises.

Unfortunately, it looks like it's going to take monumental changes to stop the parasites and the sociopaths from making at quick buck at the expense of humanity.

dvrp 28 minutes ago | parent | prev | next [-]

https://en.wikipedia.org/wiki/Productivity_paradox

smithkl42 2 hours ago | parent | prev | next [-]

Plausible article, but it reads like a preschooler frustrated that his new toy is broken. "Fix it! Make it work!" - without ever specifying how.

Granted, this is an exceedingly hard problem, and I suppose there's some value in reminding ourselves of it - but I'd much rather read thoughts on how to do it better, not just complaints that we're doing it poorly.

SatvikBeri 4 hours ago | parent | prev | next [-]

Do non-software projects succeed at a higher rate in any industry? I get the impression that projects everywhere go over time, over budget, and frequently get canceled.

vb-8448 an hour ago | parent | prev | next [-]

The main problem are incentives and risks: in most of the cases you are not incentivized to build secure and reliability SW because, most of the time, it's easy to fix it. With particular categories of SW(eg. one distributed on remote system, medical sw, military sw) or HW it's the opposite: a failure it's not so easy to fix so you are incentivized to do a better job.

The second problem are big con.

zaptheimpaler 2 hours ago | parent | prev | next [-]

This should be a criticism of the kinds of bloated firms that take on large government projects, the kinds of people they hire, the incentives at play, the bidding processes, the corruption and all the rest. It has very little to do with software and more just organizations that don't face any pressure to deliver.

serial_dev 4 hours ago | parent | prev | next [-]

This is what I’ve been thinking about when I talk to other people in software development when they can’t stop talking about how efficient they are with AI… yet they didn’t ship anything in their startup, or side project, or in a corporate setting, the project is still bug riddled, the performance is poor, now there code quality suffers too as people barely read what Cursor (etc) are spitting out.

I have “magical moments” with these tools, sometimes they solve bugs and implement features in 5 minutes that I couldn’t do in a day… at the same time, quite often they are completely useless and cause you to waste time explaining things that you could probably just code yourself much faster.

John23832 5 hours ago | parent | prev | next [-]

I often see big money put behind software projects, but the money then makes stake holders feel entitled to get in the way.

bigbuppo 6 hours ago | parent | prev | next [-]

As someone that has seen technological solutions applied when they make no sense, I think the next revolution in business processes will be de-computerization. The trend has probably already started thank to one of the major cloud outages.

stock_toaster 5 hours ago | parent [-]

> de-computerization

I would think cloud-disconnectedness (eg. computers without cloud hosted services) would come far before de-computerization.

sandeepkd 2 hours ago | parent | prev | next [-]

A slightly different take, its probably more of people failure, the lack of required expertise, skillset, motivation and coordination. People have motivations to do the job to make a living, success of any long term project is rarely the driving factor for most people working on it. People would know ahead of time when a project is going towards the direction of failure, its just how the things are structured. From systems perspective, an unknown system/requirement would be a good example where you build iteratively, a known set of requirements should give good enough idea about the feasibility and rough timelines even if its complex.

dmix 6 hours ago | parent | prev | next [-]

So in the 1990s Canada failed to do a payroll system where they paid Accenture Canada $70M

Then in 2010s they spent $185M on a customized version of IBM's PeopleSoft that was managed directly by a government agency https://en.wikipedia.org/wiki/Phoenix_pay_system

And now in 2020s they are going to spend $385M integrating an existing SaaS made by https://en.wikipedia.org/wiki/Dayforce

That's probably one of the worst and longest software failures in history.

bryanlarsen 6 hours ago | parent [-]

Oh, it's much more interesting than that. Phoenix started as an attempt to create a gun registry. Ottawa had a bunch of civil servants that'd be reasonably compotent at overseeing such a thing, but the government decided that it wanted to build it in Miramichi, New Brunswick. The relevant people refused to move to Miramichi, so the project was built using IBM contractors and newbies. The resulting fiasco was highly predictable.

Then when Harper came in he killed the registry mostly for ideological reasons.

But then he didn't want to destroy a bunch of jobs in Miramichi, so he gave them another project to turn into a fiasco.

watersb 2 hours ago | parent | prev | next [-]

It's possible that most business projects fail.

Most advertising campaigns fail.

ZeroConcerns 7 hours ago | parent | prev | next [-]

Yup, and with an equal amount of mindblowing-units-of-money spent, infrastructure projects all around me are still failing as well, or at least being modified (read: downsized), delayed and/or budget-inflated beyond recognition.

So, what's the point here, exactly? "Only licensed engineers as codified by (local!) law are allowed to do projects?" Nah, can't be it, their track record still has too many failures, sometimes even spectacularly explosive and/or implosive ones.

"Any public project should only follow Best Practices"? Sure... "And only make The People feel good"... Incoherent!

Ehhm, so, yeah, maybe things are just complicated, and we should focus more on the amount of effort we're prepared to put in, the competency (c.q. pay grade) of the staff we're willing to assign, and exactly how long we're willing to wait prior to conceding defeat?

graemep 5 hours ago | parent | next [-]

One of the problems is scale.

Large scale systems tend to fail. large centralised and centrally managed systems with big budgets and large numbers of people who need to coordinate, lots of people with an interest in the project pushing and lobbying for different things.

Multiple smaller systems is usually a better approach, where possible. Not possible for things like transport infrastructure, but often possible for software.

AlexandrB 4 hours ago | parent [-]

> Not possible for things like transport infrastructure

It depends what you define as a system. Arguably a lot of transport infrastructure is a bunch of small systems linked with well-understood interfaces (e.g. everyone agrees on the gauge of rail that's going to be installed and the voltage in the wires).

Consider how construction works in practice. There are hundreds or thousands of workers working on different parts of the overall project and each of them makes small decisions as part of their work to achieve the goal. For example, the electrical wiring of a single train station is its own self-contained system. It's necessary for the station to work, but it doesn't really depend on how the electrical system is installed in the next station in the line. The electricians installing the wiring make a bunch of tiny decisions about how and where the wires are run that are beyond the ability of someone to specify centrally - but thanks to well known best practices and standards, everything works when hooked up together.

sebastos 5 hours ago | parent | prev [-]

Nailed it, but I fear this wisdom will be easily passed by by someone who doesn’t already intuit it from years of experience. Like the Island de la Muerta: wisdom that can only be found if you already know where it is.

parasubvert 5 hours ago | parent | prev | next [-]

Working on AI that helps to manage IT shops that learns from failure & success might be better for both results and culture than most IT management roles, a profession (painting an absurdly broad brush) that tends to attract a lot of miserable creatures.

gishh 3 hours ago | parent [-]

... If this happens, the next hacks will be context poisoning. A whole cottage industry will pop around preserving and restoring context.

Sounds miserable.

Also, LLMs don't learn. :)

darepublic 3 hours ago | parent | prev | next [-]

Is it a failure if we ship the project a year late? What if everyone involved would have predicted exactly that outcome

semiinfinitely an hour ago | parent | prev | next [-]

> We are left with only a professional and personal obligation to reemphasize the obvious: Ask what you do know, what you should know, and how big the gap is between them before embarking on creating an IT system. If no one else has ever successfully built your system with the schedule, budget, and functionality you asked for, please explain why your organization thinks it can

translation: "leave it to us professionals". Gate-keeping of this kind is exactly how computer science (the one remaining technical discipline still making reliable progress) could become like all of the other anemic, cursed fields of engineering. people thinking "hey im pretty sure I could make a better version of this" and then actually doing it is exactly how progress happens. I hope nobody reads this article and takes it seriously

MattRogish 5 hours ago | parent | prev | next [-]

The lesson from “big software projects are still failing” isn’t that we need better methodologies, better project management, or stricter controls. The lesson is "don't do big software projects".

Software is not the same as building in the physical world where we get economies of scale.

Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)

Nobody will say "I want to build a bridge upside-down, out of paper clips and can withstand a 747 driving over it". Because that's physically impossible. But nothing's impossible in software.

Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics) - so anything goes and can be in scope; and since writing and integrating large amounts of code is a communication exercise, suffers from diseconomies of scale.

Customers want the software to do exactly what they want and - within reason - no laws of physics are violated if you move a button or implement some business process.

Because everyone wants to keep working the way they want to work, no software project (even if it sounds the same) is the same. Your company's bespoke accounting software will be different than mine, even if we are direct competitors in the same market. Our business processes are different, org structures are different, sales processes are different, etc.. So they all build different accounting software, even if the fundamentals (GaaP, double-entry bookkeeping, etc.) are shared.

It's also the same reason why enterprise software sucks - do you think that a startup building expense management starts off being a giant mess of garbage? No! IT starts off simple and clean and beautiful because their initial customer base (startups) are beggars and cannot be choosers, so they adapt their process to the tool. But then larger companies come along with dissimilar requirements and, Expense Management SaaS Co. wins that deal by changing the product to work with whatever oddball requirements they have, and so on, until the product essentially is a bunch of config options and workflows that you have to build yourself.

(Interestingly, I think these products become asymptotically stuck - any feature you add or remove will make some of your customers happy and some of your customers mad, so the product can never get "better" globally).

We can have all the retrospectives and learnings we want but the goal - "Build big software" - is intractable, and as long as we keep trying to do that, we will inevitably fail. This is not a systems problem that we can fix.

The lesson is: "never build big software".

(Small software is stuff like Bezos' two pizza team w/APIs etc. - many small things make a big thing)

corpMaverick 3 hours ago | parent | next [-]

I agree with you on "don't do big software project" Specially do not fast scale them out to hundreds of people. You have to scale them more organically ensuring that every person added is a net gain. They think that adding more people will reduce the time.

I am surprised on the lack of creativity when doing these projects. Why don't they start 5 small projects building the same thing and let them work for a year. At the end of the year you cancel one of the projects, increasing the funding in the other four. You can do that every year based on the results. It may look like a waste but it will significantly increase your chances of succeeding.

esafak an hour ago | parent | prev | next [-]

You have to be able to turn away "bad" customers.

stonemetal12 4 hours ago | parent | prev [-]

>Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)

Build 1000 JSON parsers and tell me if the next one isn't cheaper to develop with "(we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.)"

>Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics)

Uh, maybe fewer but none is way to far. Get 2 billion integer operations per second out of a 286, the 500 mile email, big data storage, etc. Physical limits are everywhere.

>It's also the same reason why enterprise software sucks.

The reason enterprise software sucks is because the lack of introspection and learning from the garbage that went before.

shevy-java 4 hours ago | parent | prev | next [-]

I spent way less - and they still fail!

JohnMakin 7 hours ago | parent | prev | next [-]

> "Why worry about something that isn’t going to happen?”

Lots to break down in this article other than this initial quotation, but I find a lot of parallels in failing software projects, this attitude, and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.

It was a combination of failures like this. Why was the captain going full speed ahead into a known ice field? Well, the boat can't sink and there (may have been) organizational pressure to arrive at a certain time in new york (aka, imaginary deadline must be met). Why wasn't there enough life jackets and boats for crew and passengers? Well, the boat can't sink anyway, why worry about something that isn't going to happen? Why train crew on how to deploy the life rafts and emergency procedures properly? Same reason. Why didn't the SS Californian rescue the ship? Well, the 3rd party Titanic telegraph operators had immense pressure to send telegrams to NY, and the chatter about the ice field got on their nerves and they mostly ignored it (misaligned priorities). If even a little caution and forward thinking was used, the death toll would have been drastically lower if not nearly nonexistent. It took 2 hours to sink, which is plenty of time to evacuate a boat of that size.

Same with software projects - they often fail over a period of multiple years and if you go back and look at how they went wrong, there often are numerous points and decisions made that could have reversed course, yet, often the opposite happens - management digs in even more. Project timelines are optimistic to the point of delusion and don't build in failure/setbacks into schedules or roadmaps at all. I've had to rescue one of these projects several years ago and it took a toll on me I'm pretty sure I carry to this day, I'm wildly cynical of "project management" as it relates to IT/devops.

parados 4 hours ago | parent [-]

> and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.

But the rest of your comment reveals nothing novel other than anyone would find after watching James Cameron's movie multiple times.

I suggest you go to the original inquiries (congressional in the US, Board of trade in the UK). There is a wealth of subtle lessons there.

Hint: Look at the Admiralty Manual of Seamanship that was current at that time and their recommendations when faced with an iceberg.

Hint: Look at the Board of Trade (UK) experiments with the turning behaviour of the sister ship. In particular of interest is the engine layout of the Titanic and the attempt by the crew, inexperienced with the ship, to avoid the iceberg. This was critical to the outcome.

Hint: Look at the behaviour of Captain Rostron. Lots of lessons there.

JohnMakin 3 hours ago | parent [-]

Thanks for your feedback, I’m well aware of the inquiries and the history there. However, this post was meant to be a simple analogy that related to the broader topic, not a deep dive into the theories of how and why the titanic sank. Thanks!

parados 3 hours ago | parent [-]

Got it. Thanks.

skywhopper an hour ago | parent | prev | next [-]

Because you don’t just rewrite all your payroll systems with hundreds of variations in one go. That will never work. But they keep trying it.

You update the system for one small piece, while reconciling with the larger system. Then replace other pieces over time, broadening your scope until you have improved the entire system. There is no other way to succeed without massive pain.

oldandboring 5 hours ago | parent | prev | next [-]

Almost nobody who works in software development is a licensed professional engineer. Many are even self-taught, and that includes both ICs and managers. I'm not saying this is direct causation but I do think it odd that we are so utterly dependent on software for so many critical things and yet we basically YOLO its development compared to what we expect of the people who design our bridges, our chemicals, our airplanes, etc.

keeda 4 hours ago | parent [-]

Licensing and the perceived rigor it signifies is irrelevant to whether something can be considered "professional engineering." Engineering exists at the intersection of applied science, business and economics. So most software projects can be YOLO'd simply because the economics permit it, but there are others where the high costs necessitate more rigor.

For instance, software in safety-critical systems is highly rigorously developed. However that level of investment does not make sense for run-of-the-mill internal LOB CRUD apps which constitute the vast majority of the dark matter of the software universe.

Software engineering is also nothing special when it comes to various failure modes, because you'll find similar examples in other engineering disciplines.

I commented about this at length a few days ago: https://news.ycombinator.com/item?id=45849304

runningmike 4 hours ago | parent | prev | next [-]

There is no such thing as ‘simplicity science’ that can be directly applied when dealing with IT problems. However, many insights of complexity science are applicable to solving real world IT problems. People love simple solutions. However Simple is a scam, https://nocomplexity.com/simple-is-a-scam/

There are no generic, simple solutions for complex IT challenges. But there are ground rules for finding and implementing simple solutions. I have created a playbook to prevent IT diasasters, The art and science towards simpler IT solutions see https://nocomplexity.com/documents/reports/SimplifyIT.pdf

csours an hour ago | parent | prev | next [-]

There are 2 big problems with large software projects:

1. Connecting pay to work - estimates (replanning is learning, not failure)

2. Connecting work to pay - management (the world is fractal-like, scar tissue and band-aids)

I do not pre-suppose that there are definite solutions to these problems - there may be solutions, but getting there may require going far out of our way. As the old farmer said "Oh, I can tell you how to get there, but if I was you, I wouldn't start from here"

1. Pay to Work - someone is paying for the software project, and they need to know how much it will cost. Thus estimates are asked for, an architecture is asked for, and the architecture is tied to the estimates.

This is 'The Plan!'. The project administrators will pick some lifecycle paradigm to tie the architecture to the cost estimate.

The implementation team will learn as they do their work. This learning is often viewed as failure, as the team will try things that don't work.

The implementation team will learn that the architecture needs to change in some large ways and many small ways. The smallest changes are absorbed in regular work. Medium and Large changes will require more time (thus money); This request for more money will be viewed as a failure in estimation and not as learning.

2. Work to Pay - as the architecture is implemented, development tasks are completed. The Money People want Numbers, because Money People understand how they feel about Numbers. Also these Numbers will talk to other Numbers outside the company. Important Numbers with names like Share Price.

Thus many layers of management are chartered and instituted. The lowest layer of management is the self-managed software developer. The software developer will complete development tasks related to the architecture, tied to the plan, attached to the money (and the spreadsheets grew all around, all around [0]).

When the developer communicates about work, the Management Chain cares to hear about Numbers, but sometimes they must also involve themselves in failures.

It is bad to fail, especially repeated failures at the same kind of task. So managers institute rules to prevent failures. These rules are put in a virtual cabinet, or bureau. Thus we have Rules of the Bureau or Bureaucracy. These rules are not morally bad or good; not factually incorrect or correct, but whenever we notice them, they feel bad; We notice the ones that feel bad TO US. We are often in favor of rules that feel bad to SOMEONE ELSE. You are free to opt out of this system, but there is a price to doing so.

----

Too much writing, I desist from decoding verbiage:

Thus it is OK for individuals to learn many small things, but it is a failure for the organization to learn large things. Trying to avoid and prevent failure is viewed as admirable; trying to avoid learning is self-defeating.

----

0. https://www.google.com/search?q=the+green+grass+grew+all+aro...

> git commit -am "decomposing recapitulating and recontextualizing software development bureaucracy" && git push

csours an hour ago | parent [-]

Bureaucracy is: scar tissue, someone else's moat, someone else's data model

nacozarina 5 hours ago | parent | prev | next [-]

managing software requirements and the corresponding changes to user/group/process behaviors is by far the hardest part of software development, and it is a task no one knows how to scale.

absent understanding, large companies engage in cargo cult behaviors: they create a sensible org chart, produce a gannt chart, have the coders start whacking code, presumably in 9 months a baby comes out.

every time, ugly baby

satisfice 3 hours ago | parent | prev | next [-]

Systematic decimation of test teams, elimination of test managers, and contemptuous treatment of the role of tester over the past 40 years has not yet led to a more responsible software industry. But maybe if we started burning testers at the stake all these problems will go away?

amai 5 hours ago | parent | prev | next [-]

To stop failing we could use AI to replace managers not software developers.

an0malous 4 hours ago | parent [-]

No need to waste GPUs, a simple bash script that alternates between asking for status updates and randomly changing requirements would do

supportengineer 7 hours ago | parent | prev | next [-]

The purpose of a system is what it does.

1. Enable grift to cronies

2. Promo-driven culture

3. Resume-oriented software architecture

franktankbank 7 hours ago | parent | prev | next [-]

> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions.

Somehow I come away skeptical of the inevitable conclusion that Phoenix was doomed to fail and instead that perhaps they were hamstrung by architecture constraints dictated by assholes.

QuercusMax 5 hours ago | parent | next [-]

Wasn't the Agile movement kicked off by a group of people writing payroll software for Chrysler?

https://en.wikipedia.org/wiki/Chrysler_Comprehensive_Compens...

Payroll systems seem to be a massively complicated beast.

array_key_first 3 hours ago | parent | next [-]

Arbitrary payroll is absurdly complicated. The trick is to not make it arbitrary - have a limited amount of stuff you do, and always have backdoors to manually pushing data through payroll.

franktankbank 5 hours ago | parent | prev [-]

You don't want to get me started on Agile.

ruralfam 6 hours ago | parent | prev [-]

My reaction also. 80K payroll rules!!! Without much prompt effort, I got about 350K Canada Federal Service employees (sorry if not correct).

dmix 6 hours ago | parent [-]

Sounds like they put zero effort into simplifying those rules the first time around.

Now in the new project they put together a committee to attempt it

> The main objective of this committee also includes simplifying the pay rules for public servants, in order to reduce the complexity of the development of Phoenix's replacement. This complexity of the current pay rules is a result of "negotiated rules for pay and benefits over 60 years that are specific to each of over 80 occupational groups in the public service." making it difficult to develop a single solution which can handle each occupational groups specific needs.

stackskipton 6 hours ago | parent | next [-]

I have worked on government payroll systems, simplifying those rules is almost impossible from political PoV. They are generally a combo of weird laws, court cases, union contracts and more.

Any time you think about touching them, the people who get those salaries come out in droves and no one else cares so government has every incentive to leave them alone.

tehjoker 6 hours ago | parent [-]

You could simplify them if you made sure the people getting them got overall more money ;) The government doesn't want to do that though.

franktankbank 6 hours ago | parent | prev [-]

Oh great a committee!

AndrewDucker 6 hours ago | parent [-]

Committees are how you discover what the problems are and agree solutions.

No single person is going to understand all of the history and legality involved, or be able to represent the people on all sides of this mess.

Yes, this means discussion, investigation, almost certainly months of effort to find something that works, and lots of compromise. That's how adults deal with complex situations.

locallost 2 hours ago | parent | prev | next [-]

Worth a view also. Is software engineering still an oxymoron?

https://youtu.be/D43PlUr1x_E?si=em2nNYuI8WDvtP21

827a 4 hours ago | parent | prev | next [-]

Slightly related but unpopular opinion I have: I think software, broadly, today is the highest quality its ever been. People love to hate on some specific issues concerning how the Windows file explorer takes 900ms to open instead of 150ms, or how sometimes an iOS 26 liquid glass animation is a bit janky... we're complaining about so much minutia instead of seeing the whole forest.

I trust my phone to work so much that it is now the single, non-redundant source for keys to my apartment, keys to my car, and payment method. Phones could only even hope to do all of these things as of like ~4 years ago, and only as of ~this year do I feel confident enough to not even carry redundancies. My phone has never breached that trust so critically that I feel I need to.

Of course, this article talks about new software projects. And I think the truth and reason of the matter lies in this asymmetry: Android/iOS are not new. Giving an engineering team agency and a well-defined mandate that spans a long period of time oftentimes produces fantastic software. If that mandate often changes; or if it is unclear in the first place; or if there are middlemen stakeholders involved; you run the risk of things turning sideways. The failure of large software systems is, rarely, an engineering problem.

But, of course, it sometimes is. It took us ~30-40 years of abstraction/foundation building to get to the pretty darn good software we have today. It'll take another 30-40 years to add one or two more nines of reliability. And that's ok; I think we're trending in the right direction, and we're learning. Unless we start getting AI involved; then it might take 50-60 years :)

user3939382 2 hours ago | parent | prev | next [-]

How much money do you need to build a skyscraper on top of a tarpit? None because it’s not possible. The whole stack has to be gutted. I can do it but no one wants to listen so I’ll do it myself.

add-sub-mul-div 6 hours ago | parent | prev | next [-]

An endless succession of new tools, methodologies, and roles but failure persists because success is rooted in good judgment, wisdom, and common sense.

mschuster91 4 hours ago | parent | prev | next [-]

No big surprise. Taking a shitty process and "digitalizing" it will lead to a shitty process just on computers in the best case, in the worst case everything collapses.

mariopt 7 hours ago | parent | prev | next [-]

> IT projects suffer from enough management hallucinations and delusions without AI adding to them.

Software is also incredibly hard, the human mind can understand the physical space very well but once we're deep into abstractions it simply struggles to keep up with it.

It is easier to explain how to build a house from scratch to virtually anyone than a mobile app/Excel.

apercu 6 hours ago | parent | next [-]

I came to opposite conclusions. Technology is pretty easy, people are hard and the business culture we have fostered in the last 40 years gets in the way of success.

tehjoker 6 hours ago | parent | prev [-]

Easy, just imagine a 1GB array as a 2.5mm long square in RAM (assuming a DRAM cell is 10nm). Now it's physical.

apercu 6 hours ago | parent | prev | next [-]

Hot take: It's not technical problems causing these projects to fail.

It's leadership and accountability (well, the lack of them).

AnimalMuppet 4 hours ago | parent [-]

And that often takes a particular form: The requirements never converge, or at least never converge on anything realistically buildable.

AtlasBarfed 4 hours ago | parent | prev | next [-]

Software was failing and mismanaged.

So we added a language and cultural barrier, 12 hour offset, and thousands of miles of separation with outsourcing.

Software was failing and mismanaged.

So now we will take the above failures, and now tack on an AI "prompt engineering" barrier (done by the above outsourced labor).

And on top of that, all engineers that know what they are doing are devalued from the market, all the newer engineers will be AI braindead.

Everything will be fixed!

lawlessone 5 hours ago | parent | prev | next [-]

Every improvement will be moderated increased demands from management, crunch, pressure to release, "good enough", add this extra library that monetizes/spys on the customer etc

In the same way that hardware improvements are quickly gobbled up by more demanding software.

The people doing the programming will also be more removed technically. I can do Python, Java , Kotlin. I can do a little C++ ,less C, and a lot less assembly.

lawlessone 40 minutes ago | parent [-]

will be moderated by* increased demands.

x0x0 5 hours ago | parent | prev | next [-]

The article is kind of dumb. eg it hangs its hat on the Phoenix payroll system, which

> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions. It also was attempting to implement 34 human-resource system interfaces across 101 government agencies and departments required for sharing employee data.

So basically people -- none of them in IT, but rather working for the government -- built something extraordinarily complex (80k rules!), and then are like wow, it's unforeseen that would make anything downstream at least equally as complex. And then the article blames IT in general. When this data point tells us that replacing a business process that used to require (per [1]) 2,000 pay advisors to perform will be complex. While working in an organization that has shit the bed so thoroughly that paying its employees requires 2k people. For an organization of 290k, so 0.6% of headcount is spent on paying employees!

IT is complex, but incompetent people and incompetent orgs do not magically become competent when undertaking IT projects.

Also too, making extraordinarily complex things they shouting the word "computer" at them like you're playing D&D and it's a spell does not make them simple.

[1] https://www.oag-bvg.gc.ca/internet/English/parl_oag_201711_0...

jmyeet 6 hours ago | parent | prev | next [-]

This has dot-com bubble written all over it. But there are some deeper issues.

First, we as a society should really be scrutinizing what we invest in. Trillions of dollars could end homelessness as a rounding error.

Second, real people are going to be punished for this as the layoffs go into overdrive, people lose their houses and people struggle to have enough to eat.

Third, the ultimate goal of all this investment is to displace people from the labor pool. People are annoying. They demand things like fair pay, safe working conditions and sick leave.

Who will buy the results of all this AI if there’s no one left with a job?

Lastly, the externalities of all this investment are indefensible. For example, air and water pollution and rising utility prices.

We’re bouldering towards a future with a few thousand wealthy people where everyone else lives in worker housing, owns nothing and is the next incarnation of brick kiln workers on wealthy estates.

ctoth 6 hours ago | parent | next [-]

Systemically, how would you solve homelessness, if I gave you a trillion dollars?

jddj 5 hours ago | parent [-]

A trillion in a money market fund @ 5% is 50B/year.

Over the course of a few years (so as to not drive up the price of politicians too quickly) one could buy the top N politicians from most countries. From there on out your options are many.

After a decade or so you can probably have your trillion back.

ctoth 2 hours ago | parent [-]

I really do like this answer, but it would seem to have the property of being anti-inductive in that the cost for current politicians is so low because nobody is doing it at scale but if someone did that would force other people to ... well, it's an interesting thought experiment at least!

tonyedgecombe 6 hours ago | parent | prev [-]

The article isn't really about AI (for a change).

exabrial 6 hours ago | parent | prev [-]

The biggest reason is developer ego. Devs see their code as artwork an extension of themselves, so it's really hard to have critical conversations about small things and they erupt into holy wars. Off hand:

* Formatting

* Style

* Conventions

* Patterns

* Using the latest frameworks or whats en-vogue

I think where I've seen results delivered effectively and consistently is where there is a universal style enforced, which removes the individualism from the codebase. Some devs will not thrive in that environment, but instead it makes the code a means-to-the-end, rather than being-the-end.

AlotOfReading 6 hours ago | parent | next [-]

As far as I can see in the modern tech industry landscape, virtually everyone has adopted style guides and automatic formatting/linting. Modern languages like Go even bake those decisions into the language itself.

I'd consider managing that stuff essentially table-stakes in big orgs these days. It doesn't stop projects from failing in highly expensive and visible ways.

ctoth 6 hours ago | parent | prev | next [-]

The UK Post Office lied and made people kill themselves ... because of dev ego?

exabrial 5 hours ago | parent | prev [-]

Ironically, the downvotes pretty much prove this is exactly correct.

parasubvert 5 hours ago | parent [-]

Eh, you're not wrong, but management failures tend to be a bigger issue. On the hierarchy of ways software projects fail, developer ego is kind of upper-middle of the pack rather than top. Delusional, ignorant, or sadistic leadership tends to be higher.