Remix.run Logo
pvtmert 3 days ago

IMO, rushing things never helps. If possible, I investigate external calls/meetings well in-advance, at worst case, I add 30-minute calendar block before those. (To prepare and install/update things).

As a DevOps, I have seen the quote about "premature optimisation's root of all evil" in real life quite often. In fact, optimising one bottleneck quickly yields another one -moving the goalpost further-, potentially increasing business-impact if the flow is not contained properly.

Especially during incidents, _rushing_ to fix often yields more problems. I've seen people isolating/shutting-down mildly misbehaving instances. Causing excessive load to the remaining and starting the cascading failure like dominos falling one after another.

Which reminds me a scene from "The Office", where Dwight goes rogue and conducts a "Fire-Drill" by locking doors and deliberately causing smoke. Everyone panics and hell breaks loose. This is at the beginning of the episode, maybe 5-minutes tops. I show this at the incident-management training, this is how people behave in real life. No joke.

To give more concrete aspect on the moving goalpost: SWEs improve transaction processing with multi-threading, but that causes more connections/transactions to the database. Even though theoretical gains are Nx (n-times depending on threads/cores), real life gains are 1.2x-1.3x, because database connections are getting occupied. As the next step, increasing number of DB connections helps, maybe add another master node (risk of having deadlocks increase, but ignore for now for the sake of argument). But then the disk IO becomes the bottleneck due to write-heavy (payments domain). Then we add Redis to reduce load, and maybe some asynchronous processing. At this point complexity increases and we need to solve rare occurrences of duplicate data or race-conditions because it is not single-threaded process anymore...