Because it's significantly harder to isolate problems and you'll end up in this loop

* Deploy everything * It explodes * Rollback everything * Spend two weeks finding problem in one system and then fix it * Deploy everything * It explodes * Rollback everything * Spend two weeks finding a new problem that was created while you were fixing the last problem * Repeat ad nauseum

Migrating iteratively gives you a foundation to build upon with each component

▲

wizzwizz4 8 hours ago | parent [-]

So… create your shadow system piecewise? There is no reason to have "explode production" in your workflow, unless you are truly starved for resources.

▲

paulddraper 4 hours ago | parent [-]

Does this shadow system have usage?

Does it handle queries, trigger CI actions, run jobs?

▲

wizzwizz4 3 hours ago | parent [-]

If you test it, yes.

Of course, you need some way of producing test loads similar to those found in production. One way would be to take a snapshot of production, tap incoming requests for a few weeks, log everything, then replay it at "as fast as we can" speed for testing; another way would be to just mirror production live, running the same operations in test as run in production.

Alternatively, you could take the "chaos monkey" approach (https://www.folklore.org/Monkey_Lives.html), do away with all notions of realism, and just fuzz the heck out of your test system. I'd go with that, first, because it's easy, and tends to catch the more obvious bugs.

	▲	chickenpotpie an hour ago \| parent [-]
		So just double your cloud bill for several few weeks, costing site like GitHub millions of dollars? How do you handle duplicate requests to external services? Are you going to run credit cards twice? Send emails twice? If not, how do you know it's working with fidelity?