> I’ve later learned that restarting a container that is part of a pod will have the (to me, unexpected) side-effect to restart all the other containers of that pod.

Anyone know why this is? Or, for that matter, why Kubernetes seems to work like this too?

I have an application for which the natural solution would be to create a pod and then, as needed, create and destroy containers within the pod. (Why? Because I have some network resources that don’t really virtualize, so they can live in one network namespace. No bridges.)

But despite containerd and Podman and Kubernetes kind-of-sort-of supporting this, they don’t seem to actually want to work this way. Why not?

▲

kace91 9 hours ago | parent | next [-]

>Anyone know why this is? Or, for that matter, why Kubernetes seems to work like this too?

Pods are specifically not wanted to be treated as vms, but as a single application/deployment units.

Among other things, if a container goes down you don’t know if it corrupted shared state (leaving sockets open or whatever). So you don’t know if the pod is healthy after restart. Also reviving it might not necessarily work, if the original startup process relied on some boot order. So to guarantee a return to healthy you need to restart the whole thing.

	▲	amluto 5 hours ago \| parent [-]
		> Among other things, if a container goes down you don’t know if it corrupted shared state (leaving sockets open or whatever). This is not a thing. A program that opens a socket and crashes does not leak that socket for the lifetime of the network namespace. (Keep in mind that ordinary non-containerized servers usually have exactly one network namespace. If a program crashes, you restart it. Sure, CLOSE_WAIT is a thing, but it’s neither permanent nor usually a big deal.)

▲

gucci-on-fleek 9 hours ago | parent | prev | next [-]

> Anyone know why this is?

In Podman, a pod is essentially just a single container; each "container" within a pod is just a separate rootfs. So from that perspective, it makes sense, since you can't really restart half of a container. (But I think that it might be possible to restart individual containers within a pod; but if any container within a pod fails, then I think that the whole pod will automatically restart)

> Why? Because I have some network resources that don’t really virtualize, so they can live in one network namespace.

You can run separate containers in the same network namespace with the "--network" option [0]. You can either start one container with its own automatic netns and then join the other containers to it with "--network=container:<name>", or you can manually create a new netns with "podman network create <name>" and then join all the containers to it with "--network=<name>".

[0]: https://docs.podman.io/en/latest/markdown/podman-run.1.html#...

▲

amluto 9 hours ago | parent [-]

> You can run separate containers in the same network namespace with the "--network" option [0].

Oh, right, thanks. I think I did notice that last time I dug into this. But:

> or you can manually create a new netns with "podman network create <name>" and then join all the containers to it with "--network=<name>".

I don’t think this has the desired effect at all. And the docs for podman network connect don’t mention pods at all, which is odd. In general, I have not been very impressed by podman.

Incidentally, apptainer seems to have a more or less first class ability to join an existing netns, and it supports CNI. Maybe I should give it a try.

	▲	gucci-on-fleek 2 hours ago \| parent [-]
		> > or you can manually create a new netns with "podman network create <name>" and then join all the containers to it with "--network=<name>". > I don’t think this has the desired effect at all. Well I'm not entirely sure what effect you're wanting here, but I use this option for some of the containers that I run, and it makes it so that all containers in that network can reach each other, while anything outside that network can't. You can also use "--network=ns:/run/user/$UID/netns/<file-name>" to join a container to a manually created network namespace (created with "ip netns add <file-name>") if you need more control.

▲

stryan 9 hours ago | parent | prev | next [-]

Yeah I was a little confused at this line; as far as I can tell you can restart containers that are a part of a Podman pod without restarting the whole pod just fine. I just verified this on one of my MicroOS boxes running Podman v5.7.1 .

Podman was changing pretty fast for a while so it could be an older version thing, though I'd assume FCOS is on Podman 5 by now.

▲

esseph 8 hours ago | parent | prev | next [-]

The general idea is you want a single application per pod, unless you need a sidecar service to live in the same pod of each instance of your app.

You are normally running several instances of your frontend so that it can crash without impacting the user experience, or so it can get deployed to in a rolling manner, etc.

	▲	amluto 4 hours ago \| parent [-]
		I’m fine with this being the general idea. But it seems a bit unfortunate to make it be the only idea. > You are normally running several instances of your frontend so that it can crash without impacting the user experience, or so it can get deployed to in a rolling manner, etc. Err, the classic way to do this is to hand off the listening socket from one server instance to the next. You can’t do this if your orchestration tools insist on tearing down the entire network namespace to update the server. Sure, you can use fancy load balancers or software defined networking or firewall kludges to hand off something that functions like a listening socket, but it kind of feels like we lost the plot somehow. The old techniques work, and they often worked at the appropriate scale for the application — why are we building new systems can’t be made to work well without extra layers. In any event, the feature I want isn’t rocket science. I think Kubernetes would need to add two special kinds of Pods: 1. An joinable Pod that explicitly permits other Pods to join with it (this would be a genuine Pod with some special attributes). 2. A subsidiary Pod that depends on a joinable Pod and joins its network namespace. This would almost be a real pod except that it would have no network namespace of its own and hence no normal managed hostname or addresses. #2 is a bit weird, but there’s precedent. A hostNetwork: true Pod is already weird in exactly the same way.

▲

9 hours ago | parent | prev [-]

[deleted]