Remix.run Logo
simonw 5 hours ago

Firing people for bad architectural decisions is generally a terrible idea - especially decisions that shipped and ran in production for several years.

This article also doesn't make a convincing case for this being a huge mistake. Companies like Uber change their architectural decisions while they scale all the time. Provided it didn't kill the company stuff like this becomes part of the story of how they got to where they are.

Related: the classic line commonly attributed to original IBM CEO Thomas John Watson Sr:

“Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?”

https://blog.4psa.com/quote-day-thomas-john-watson-sr-ibm/

alemanek 4 hours ago | parent | next [-]

Also the article doesn’t attempt to explore the business and resourcing constraints they were operating under at the time.

I have been in situations where I was told “don’t worry about cost just get it done”. Then a few years later the business constraints shift and now we need to “worry about the cost”. It ignores that decisions made under a different set of constraints were correct, or at least reasonable, at the time but things change.

One of my pet peeves is when people say “do it right the first time” but the definition of “right” often changes over time. If the only major flaw of this design was that it was expensive; then I am much more skeptical that it was wrong given the original set of conditions that they were operating under.

jakevoytko 3 hours ago | parent | next [-]

Yeah, this is exactly what I thought when I read this post. It seemed like the author either hasn't worked in big tech, or hasn't worked in the industry very long. It's extremely likely that the engineer who designed this was standing on his desk shouting "it's going to cost THIS MUCH MONEY. I want to make sure that EVERYONE IS OK WITH THIS." and was met with shrugs.

Here's how a big tech reporting chain sees this situation when everything is smooth sailing: "We're growing 3x year-over-year? After 2 years, the cost will be an order of magnitude higher no matter what solution we pick. The constant factor doesn't matter that much. But we have such an incredible roadmap that we will book more than an order of magnitude of revenue, backed by this new ledger project. The cost will always be a nonissue because of growth."

And then 2 years go by, and this incredible product growth adds a bunch of ledger entries that weren't there 2 years ago, someone nudges your reporting chain with the question, "this is pretty expensive.. what gives?" and then someone with a good combination of social and technical skills points out that a migration to your existing storage solution would be a cost effective way to continue growing.

At every step of the way, everyone is generally happy with what's going on.

conductr 2 hours ago | parent | prev | next [-]

Also totally possible that it was just an unpublished partnership of sorts between AWS and Uber. AWS wants the logo and a big case study implementation to give the product some credibility or a boost. Uber may not have been charged at all, may have even been paid to use AWS. The Uber developer may not have even known, just was given an edict to build it on dynamodb.

basilgohar 4 hours ago | parent | prev [-]

I think it's important for leadership to clearly define what right is in these cases, too, otherwise, you get as many ddefinitions of "right" as you have people, times, and places.

Easy to say, but it's a real human cost to relying on people to figure out what you mean rather than explaining what you mean. Not enough time is spent on cultivating effective communication and training. Everyone wants everything done yesterday and don't feel like investing in their own people.

pjc50 3 hours ago | parent | prev | next [-]

It probably was an unnecessary redesign that could have been avoided, but hey: at least it worked, and eight million dollars is not a huge amount for Uber.

Birmingham spent almost £150m for a system that didn't work at all:

https://www.theregister.com/2026/01/29/birmingham_oracle_lat...

While I was an undergraduate, my university also spent £9m on accounting that didn't work, also with Oracle: http://news.bbc.co.uk/1/hi/education/1634558.stm

If you've designed a system in house for your accounting, it works, makes neither financial nor software errors, is accepted by the users, and got away with it costing a relatively small fraction of your turnover? That's a big win.

arethuza 3 hours ago | parent [-]

ERP implementations probably don't fail for those kind of architectural reasons?

hilariously 5 hours ago | parent | prev | next [-]

Do you think that the social climbers who approved these obviously crappy projects learned anything?

I have worked with all levels of engineers who come into a project glassy eyed about some technology, sure, but if you are part of the team approving a project and you cant produce a realistic budget then your management is bogus as hell.

I have worked on a ton of these vanity projects, and when I voice my concerns its clear nobody is out to learn anything, they are here to look good and avoid looking bad, that's about it.

Get some articles published, go to some conferences, get a new job with a new title somewhere else, laugh on your way out.

pc86 5 hours ago | parent | next [-]

> Do you think that the social climbers who approved these obviously crappy projects learned anything?

Just the framing of this question makes it seem like you simply don't like people in management / decision-makers, and you want something bad to happen to them. Maybe that's wrong, hopefully it is, but the rest of the comment doesn't do much to dissuade me of that impression either.

Aurornis 5 hours ago | parent | next [-]

Cutting down anyone who gets a promotion or finds success is a culture in itself (see Tall Poppies Syndrome for example). Factual accuracy is not a concern, they only want to be angry at people in higher positions.

hilariously 4 hours ago | parent | prev [-]

Something bad to happen to "them"? There's no diaphanous them, just the specific social climbing crap decision makers facing no consequences of any type.

I have worked with many hard working and caring managers, and they are generally eclipsed by said social climbers presenting at conferences every other week about know-nothing topics jumping from place to place leaving bankrupt companies and massive layoffs in their wake.

I see them posting on LI right now :)

ljm an hour ago | parent | next [-]

Why are you thinking more about the people that piss you off than the ones that you consider hard working and caring?

You have a massive chip on your shoulder, dare I say that's why you've had many caring managers and now you're seeing them all as 'social climbers'.

Did one manager call you out on something and you torched the entire thing?

tclancy an hour ago | parent | prev [-]

>There's no diaphanous them

Autocorrect mistake? I doubt anyone was imaging semi-transparent beings wafting gently in a summer breeze.

So what would you call your alternative to blameless postmortems? FWIW, "walking the plank" is already in use.

simonw 5 hours ago | parent | prev [-]

I've certainly learned a great deal from my own crap glassy-eyed decisions throughout my career.

robertlagrant 5 hours ago | parent | prev | next [-]

I agree. It is a lot of money, but that's the hope from paying engineers well: to make the chances of very expensive mistakes unlikely.

One thing I did think about was how this could have been architected without sufficient reference to costs, which might have been a process or structure improvement.

simonw 5 hours ago | parent [-]

Right - if your engineering organization ships designs that are bad economically, the solution is to introduce a culture of predicting costs before committing to a design, and processes to help enforce that culture.

Add "expected budget, double-checked by at least one other principal engineer" to the project checklist.

Have the person most responsive for the $8m "mistake" be the person to drive that cultural change, since they now have the most credibility for why it's a useful step!

havnagiggle 5 hours ago | parent | prev | next [-]

I went to school with a guy that dropped a $100k-200k VNA at Apple during an internship. He didn't get a full-time offer despite their investment :P

Aurornis 5 hours ago | parent | next [-]

Letting interns carry six figure equipment, which would also be unexpectedly heavy especially if this happened some years ago, would be a weird thing for any lab I’ve worked in. There are too many things that can predictably go wrong in the hands of an inexperienced person, as happened here.

Interns wouldn’t even be allowed to use $100K VNAs without a lot of supervision because so many things can go wrong. Damaging one of those small precision connectors is easy to do and can be a costly repair that brings delays to the lab, and that’s before you even start making measurements.

I wonder if part of the offense was that the intern was breaking protocol by moving the equipment. Alternatively they probably failed to explain the rules and expectations to the intern. Or maybe some lazy engineer tried to pawn off their work on to an intern without thinking about the consequence.

noodlesUK 4 hours ago | parent [-]

I'm not sure - the level of scrutiny that usage/abusage of expensive equipment gets varies wildly from organisation to organisation. I've worked in some places where very expensive equipment is handled roughly, or even taken home in some cases. In others, there are meticulous procedures for even $1-5k pieces of equipment. It's just a cultural thing.

Aurornis 4 hours ago | parent [-]

For this example it’s the delicacy and fragility of the instrument, the price is just a proxy for that.

Expensive VNAs are also precision, calibrated instruments with small connectors that can easily be degraded by even simple misuse. Frontends destroyed or subtly damaged in ways that break measurements by allowing the wrong signal to enter.

It’s easy to damage one in a way that will interfere with measurements for months before someone realizes what’s wrong, which is more costly than the VNA itself.

These instruments require training to handle. It’s not even about the price, it’s absurd that they’d let an intern carry one around at all (if it was allowed)

This is like the hardware equivalent of an intern accidentally dropping the production DB. My first question would be how they got to the point where an intern was in a position to be able to drop the production DB because everyone understands what can go wrong

mrWiz 3 hours ago | parent | next [-]

The obvious answer is because VNAs are heavy and the person who would otherwise have to carry it isn't the person who has to pay for a replacement.

noodlesUK 4 hours ago | parent | prev [-]

Fair enough. Fragility is probably more important than price in this scenario.

mannykannot 5 hours ago | parent | prev [-]

I cannot, of course, speak about this particular incident, but a person inclined to skip procedures expressly implemented to avoid the problem which occurred, or who ignores clear warnings that a problem is developing, is a liability, not a trained asset.

vkou 3 hours ago | parent | prev | next [-]

Meanwhile, in a sibling thread about an accounting mistake in California, everyone is screaming for blood.

Blame-free post-mortems are for me and mine, everyone else can get fucked.

embedding-shape 5 hours ago | parent | prev [-]

> Firing people for bad architectural decisions is generally a terrible idea

I mean, if we're considering factors that could make fire a developer, suggesting, pushing and eventually failing to implement bad designs and architectures probably ranks among some of the more reasonable reasons for firing them. It doesn't seem to have been "Oops we used MariaDB when we should have used MySQL" but more like "We made a bad design decision, lets cover it up with another bad design decision" and repeat, at least judging by this part:

> So let me get this straight: DynamoDB was a bad choice because it was expensive, which is something you could have figured out in advance. You then decided to move everything to an internal data store that had been built for something else3, that was available when you decided to build on top of DynamoDB. And that internal data store wasn’t good on its own, so you had to build a streaming framework to complete the migration.

But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.

otherme123 5 hours ago | parent [-]

> But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.

And you just teached all your workers to be as cautious as being freezed, never be proactive, keep the status quo as much as they can, avoid being noticed, and never take a step without being forced or having someone else to take 100% blame (with paper trail) if things go south.

data-ottawa 4 hours ago | parent | next [-]

One of my favourite bosses ever was a VP who kept a bankers box at her desk and very few personal affects.

She told me she kept it there because her job was to make decisions and get fired or leave if she was wrong. She was right about so many of her choices, I would have followed her into anything. Then one day I came in and her desk was empty -- she had an apparently epic argument with the C suite and disagreed with their path so she left (never found out if that was a quit or fired). The team got a new VP, but I requested to be moved to a different team as I wasn't aligned with the new vision.

When you get to a certain level part of your job becomes owning the decisions and getting fired.

BeetleB 3 hours ago | parent | prev | next [-]

And in some workplaces, that actually is the way to go!

I once worked in a manufacturing environment where mistakes could be quite expensive. We had our annual org survey and one of the questions asked was "Risk taking is encouraged." Our team scored low on that metric, and upper management was concerned. They held a meeting to ask about it, and most of the team was confused why there was a meeting. They said they viewed it as a positive that they don't take risks.

embedding-shape 4 hours ago | parent | prev [-]

I guess if that's your experience of letting toxic people go, maybe everyone you worked with was toxic? The usual reacttion I see from teams when firing people who seem to make a project/product worse instead of better, tends to be a sigh of relief and a communal feeling of "Lets get back to business".

Firing people making bad choices, people tend to appreciate that. Firing people making good choices? Yeah, I'd understand that would freeze people and make them avoid making proactive choices, try to not do that obviously.

pjc50 3 hours ago | parent [-]

No, he's right.

Remember you can conduct only one of the two different types of postmortem, the air crash style blameless one (to find out what happened) and the blame-based one (to find out who to punish). Once you conduct the latter, everyone psychologically "lawyers up". You get a lot more meetings. A lot more paper trail. A lot more delay. You don't just pick a database, you commission a sub-committee for database choice to review the available options over the next six months.

That's why government / civil service operations are so slow. They operate in a very high blame political environment.

embedding-shape 3 hours ago | parent [-]

Right, so say we have this situation where you're choosing a SQL database. The organization made a choice that leads to lots of complications, where often times the reason for the complication is because the organization made yet another bad choice. Repeat a couple of times.

We do a blameless postmortem about each one of these, where essentially we only focus on the root causes of the actual problems, but somehow it never comes up that there was one individual who made those bad choices over and over, which lead to the situations arising in the first place.

Do you just never address this? Do you continue to say "Well, it wasn't X's fault, it's the system around X that let X make that decision that needs fixing" even when it repeats, and the humans involved can already see what's going on?

In my mind you need to be able to address bad behavior in organizations where choices have an impact on something produced, otherwise we cannot change the quality what is being produced, or prevent production issues, since it's based on the choices we make, and if "we" make bad choices, the quality will be bad.

Ultimately I agree with you in more serious engineering-heavy domains, like airplanes and what not, and it's a sane default mode, to try to address what's happening around rather than decisions by individuals. But I also don't think that should mean that other domains aren't better served by some hybrid model, especially when it's about producing artifacts of some sort, and similar things.

otherme123 2 hours ago | parent [-]

>was one individual who made those bad choices over and over

This was never said, or even implied, in the article. We don't even know if this was a single person choice.

You are making up "facts" like calling the person who makes mistakes "toxic", or saying that the choice was made by someone who only made bad choices.

We are talking Uber here, in 2017, which was not only playing "move fast and break things" but "move really fast while shooting an AK47 blindfolded". Not only they expected mistakes, but they encouraged them. It would be plain wrong to start firing individual people for making mistakes if that is the environment.