| |
| ▲ | sethammons a day ago | parent | next [-] | | Took us three-four years to go from self hosted multi-dc to getting the main product almost fully in k8s (some parts didn't make sense in k8s and was pushed to our geo-distributed edge nodes). Dozens of services and teams and keeping the old stuff working while changing the tire on the car while driving. All while the company continues to grow and scale doubles every year or so. It takes maturity in testing and monitoring and it takes longer that everyone estimates | |
| ▲ | Cpoll 20 hours ago | parent | prev | next [-] | | It sounds like it's not easy to figure out the permissions, envvars, memory size, etc. of your existing system, and that's why the migration is so difficult? That's not really one of Kubernetes' (many) failings. | | |
| ▲ | Vegenoid 19 hours ago | parent | next [-] | | Yes, and now we are back at the ancestor comment’s original point: “at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place” Which I understand to mean “some people think using Kubernetes will make managing a system easier, but it often will not do that” | |
| ▲ | Pedro_Ribeiro 18 hours ago | parent | prev [-] | | Can you elaborate on other things you think Kubernetes gets wrong? Asking out of curiosity because I haven't delved deep into it. |
| |
| ▲ | tail_exchange a day ago | parent | prev | next [-] | | It largely depends how customized each microservice is, and how many people are working on this project. I've seen migrations of thousands of microservices happening with the span of two years. Longer timeline, yes, but the number of microservices is orders of magnitude larger. Though I suppose the organization works differently at this level. The Kubernetes team build a tool to migrate the microservices, and each owner was asked to perform the migration themselves. Small microservices could be migrated in less than three days, while the large and risk-critical ones took a couple weeks. This all happened in less than two years, but it took more than that in terms of engineer/weeks. The project was very successful though. The company spends way less money now because of the autoscaling features, and the ability to run multiple microservices in the same node. Regardless, if the company is running 12 microservices and this number is expected to grow, this is probably a good time to migrate. How did they account for the different shape of services (stateful, stateless, leader elected, cron, etc), networking settings, styles of deployment (blue-green, rolling updates, etc), secret management, load testing, bug bashing, gradual rollouts, dockerizing the containers, etc? If it's taking 4x longer than originally anticipated, it seems like there was a massive failure in project design. | | |
| ▲ | hedora a day ago | parent [-] | | 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career). Similarly, the actual migration times you estimate add up to decades of engineer time. It’s possible kubernetes saves more time than using the alternative costs, but that definitely wasn’t the case at my previous two jobs. The jury is out at the current job. I see the opportunity cost of this stuff every day at work, and am patiently waiting for a replacement. | | |
| ▲ | tail_exchange a day ago | parent | next [-] | | > 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career). Not really, they only had to use the tool to run the migration and then validate that it worked properly. As the other commenter said, a very basic setup for kubernetes is not that hard; the difficult set up is left to the devops team, while the service owners just need to see the basics. But sure, we can estimate it at 38 engineering years. That's still 38 years for 2,000 microservices; it's way better than 1 year for 12 microservices like in OP's case. Savings that we got was enough to offset these 38 years of work, so this project is now paying dividends. | |
| ▲ | mschuster91 a day ago | parent | prev [-] | | > 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career). Learning k8s enough to be able to work with it isn't that hard. Have a centralized team write up a decent template for a CI/CD pipeline, Dockerfile for the most common stacks you use and a Helm chart with an example for a Deployment, PersistentVolumeClaim, Service and Ingress, distribute that, and be available for support should the need for Kubernetes be beyond "we need 1-N pods for this service, they got some environment variables from which they are configured, and maybe a Secret/ConfigMap if the application rather wants configuration to be done in files" is enough in my experience. | | |
| ▲ | relaxing a day ago | parent [-] | | > Learning k8s enough to be able to work with it isn't that hard. I’ve seen a lot of people learn enough k8s to be dangerous. Learning it well enough to not get wrapped around the axle with some networking or storage details is quite a bit harder. | | |
| ▲ | mschuster91 a day ago | parent [-] | | For sure but that's the job of a good ops department - where I work at for example, every project's CI/CD pipeline has its own IAM user mapping to a Kubernetes role that only has explicitly defined capabilities: create, modify and delete just the utter basics. Even if they'd commit something into the Helm chart that could cause an annoyance, the service account wouldn't be able to call the required APIs. And the templates themselves come with security built-in - privileges are all explicitly dropped, pod UIDs/GIDs hardcoded to non-root, and we're deploying Network Policies at least for ingress as well now. Only egress network policies aren't available, we haven't been able to make these work with services. Anyone wishing to do stuff like use the RDS database provisioner gets an introduction from us on how to use it and what the pitfalls are, and regular reviews of their code. They're flexible but we keep tabs on what they're doing, and when they have done something useful we aren't shy from integrating whatever they have done to our shared template repository. |
|
|
|
| |
| ▲ | zdragnar a day ago | parent | prev | next [-] | | Comparing the simplicity of two PHP servers against a setup with a dozen services is always going to be one sided. The difference in complexity alone is massive, regardless of whether you use k8s or not. My current employer did something similar, but with fewer services. The upshot is that with terraform and helm and all the other yaml files defining our cluster, we have test environments on demand, and our uptime is 100x better. | |
| ▲ | loftsy a day ago | parent | prev | next [-] | | Fair enough that sounds hard. Memory size is an interesting example. A typical Kubernetes deployment has much more control over this than a typical non-container setup. It is costing you to figure out the right setting but in the long term you are rewarded with a more robust and more re-deployable application. | | |
| ▲ | otabdeveloper4 10 hours ago | parent [-] | | > has much more control over this than a typical non-container setup Actually not true, k8s uses the exact same cgroups API for this under the hood that systemd does. |
| |
| ▲ | jrs235 a day ago | parent | prev | next [-] | | > I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing. Unfortunately, I do. Somebody says that when the culture of the organization expects to be told and hear what they want to hear rather than the cold hard truth. And likely the person saying that says it from a perch up high and not responsible for the day to day work of actually implementing the change. I see this happen when the person, management/leadership, lacks the skills and knowledge to perform the work themselves. They've never been in the trenches and had to actually deal face to face with the devil in the details. | |
| ▲ | malux85 20 hours ago | parent | prev [-] | | Canary deploy dude (or dude-ette), route 0.001% of service traffic and then slowly move it over. Then set error budgets. Then a bad service wont "bring down production". Thats how we did it at Google (I was part of the core team responsible for ad serving infra - billions of ads to billions of users a day) |
|
| |
| ▲ | jrockway 16 hours ago | parent [-] | | Yup, I like this approach a lot. With cloud providers considering VMs durable these days (they get new hardware for your VM if the hardware it's on dies, without dropping any TCP connections), I think a 1 node approach is enough for small things. You can get like 192 vCPUs per node. This is enough for a lot of small companies. I occasionally try non-k8s approaches to see what I'm missing. I have a small ARM machine that runs Home Assistant and some other stuff. My first instinct was to run k8s (probably kind honestly), but didn't really want to write a bunch of manifests and let myself scope creep to running ArgoCD. I decided on `podman generate systemd` instead (with nightly re-pulls of the "latest" tag; I live and die by the bleeding edge). This was OK, until I added zwavejs, and now the versions sometimes get out of sync, which I notice by a certain light switch not working anymore. What I should have done instead was have some sort of git repository where I have the versions of these two things, and to update them atomically both at the exact same time. Oh wow, I really did need ArgoCD and Kubernetes ;) I get by with podman by angrily ssh-ing in in my winter jacket when I'm trying to leave my house but can't turn the lights off. Maybe this can be blamed on auto-updates, but frankly anything exposed to a network that is out of date is also a risk, so, I don't think you can ever really win. |
|