▲ | stackskipton a day ago | |
Ops type (DevOps/SRE/Sysadmin/whatever you want to call me) here, so I was really interested and this blog left me with more questions than answers? What is SI? Homegrown GUI Terraform? That part is not clear in article. It looks like homegrown GUI Terraform with module so that's what I'm going with. Cool, glad you got that working, sounds like a big project and you were able to pull it off. However, this part confused me, "Our engineers were investing a lot of time in what felt like “IaC limbo,” making a change in a Terraform file, waiting for review, waiting for CI/CD to run, and only then finding out if it worked. A simple tweak to a networking rule could take hours to validate." What in tarnation are you doing? Do you have massive terraform file repo so apply takes forever since the plan is running forever? Talk to me Goose, what is going on that Terraform changes take hours to run? Our worst folder takes about 10 minutes to plan because it's massive "everything for this specific project". We also let people run tofu plan/apply from their laptops in Dev so feedback is instant. We do have folders that have dependency on others folder, for example, we can't setup Azure Kubernetes without network being in place but we just left dependson yaml that our CI/CD pipelines work off when doing full rollout which is not their normal mode of operation (it's for DR only). We also assume that people have not been ClickOps either or if they have, they take responsibility for letting IaC resolve it. Writing your own API calls to Cloud Provider is not something I would wish upon anyone. I did it for Prometheus HTTP Service Discovery system and just getting data was difficult, I can't imagine Create/Update/Delete. | ||
▲ | ryanryke a day ago | parent | next [-] | |
Thanks for the feedback. I'm new to the platform, and certainly appreciate the interaction. I think I described SI a bit better in another reply, and you can certainly check their website for a better description than I can give here. I'll try to high level our particular issues to give you a sense of why this is important to us. Traditionally, we've managed our customers via TF. I made a big push years back to try and standardize how we delivered infrastructure to our customers. We started pushing module libraries, abstract variables via yaml, and leveraged terra grunt to try and be as dry as possible. We followed along best practices to try and minimize state files for reduced blast radius etc. What became apparent was that despite how much we tried to standardize there was always something that didn't fit between customers. So quickly each customer became a snowflake. It would have its own special version of some module or some specialized logic to match their workflow. Then over time as the modules evolved, so the questions start to come up: - Do we go back and update every customer with the new version of the module? - Does the new module have different provider/submodule/tf version requirements? - Did the customer make some other changes to infra that aren't captured? Making minor changes could end up taking way longer than necessary. Making large changes could be a nightmare. In working with SI the mindset has shifted. Rather than manage the hypothetical (ie what's written in TF), let's manage the actual. Trying to reconcile in code why a container has 2cpus instead of 4, find the issue and fix it. If want to upgrade something, find it and upgrade it. I can go into greater depth if you care or have questions, but this at a high level explains this post a bit more. | ||
▲ | SteveNuts a day ago | parent | prev [-] | |
>this blog left me with more questions than answers Probably because it's a thinly veiled ad, I agree the post is severely lacking details. |