| ▲ | paradite 3 hours ago | |||||||||||||||||||||||||||||||||||||
The deployment pattern from Cloudflare looks insane to me. I've worked at one of the top fintech firms, whenever we do a config change or deployment, we are supposed to have rollback plan ready and monitor key dashboards for 15-30 minutes. The dashboards need to be prepared beforehand on systems and key business metrics that would be affected by the deployment and reviewed by teammates. I've never seen a downtime longer than 1 minute while I was there, because you get a spike on the dashboard immediately when something goes wrong. For the entire system to be down for 10+ minutes due to a bad config change or deployment is just beyond me. | ||||||||||||||||||||||||||||||||||||||
| ▲ | vlovich123 an hour ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
That is also true at Cloudflare for what it’s worth. However, the company is so big that there’s so many different products all shipping at the same time it can be hard to correlate it to your release, especially since there’s a 5 min lag (if I recall correctly) in the monitoring dashboards to get all the telemetry from thousands of servers worldwide. Comparing the difficulty of running the world’s internet traffic with hundreds of customer products with your fintech experience is like saying “I can lift 10 pounds. I don’t know why these guys are struggling to lift 500 pounds”. | ||||||||||||||||||||||||||||||||||||||
| ▲ | dehrmann an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Cloudflare is orders of magnitude larger than any fintech. Rollouts likely take much longer, and having a human monitoring a dashboard doesn't scale. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | markus_zhang 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
My guess is that CF has so many external customers that they need to move fast and try not to break things. My hunch is that their culture always favors moving fast. As long as they are not breaking too many things, customers won't leave them. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | theideaofcoffee 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Same, my time at a F100 ecommerce retailer showed me the same. Every change control board justification needed an explicit back-out/restoration plan with exact steps to be taken, what was being monitored to ensure that was being held to, contacts of prominent groups anticipated to have an effect, emergency numbers/rooms for quick conferences if in fact something did happen. The process was pretty tight, almost no revenue-affecting outages from what I can remember because it was such a collaborative effort (even though the board presentation seemed a bit spiky and confrontational at the time, everyone was working together). | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | draw_down 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
[dead] | ||||||||||||||||||||||||||||||||||||||