Remix.run Logo
jchw 3 days ago

Hmm.

On one hand, I don't deploy or use Docker itself. To me, "Docker" is more of a concept than a tool. Due to the way networking works in actual Docker, I don't like using it in either dev machines or production, even for smaller cases.

On dev machines, I usually use rootless Podman, because it is simple and does everything I need without needing real root privileges, and on production machines I usually use some variation of Kubernetes, because it is extremely useful to have a distributed job scheduler with support for periodic tasks, replication, generic resource allocation, secrets management, metrics, logging, etc.

I do not consider rootless Podman or Kubernetes to be a strong security boundary, i.e. I would never run completely untrusted code in Podman or Kubernetes directly. However, it is sufficient to isolate single tenant workloads running code that I do trust.

So far, I feel like I am disagreeing with most of your points. Hazardous? Honestly, laughable. curl'ing a binary and running it has most of the same downsides, but less isolation and it's arguably harder to manage. No matter how you deploy things, no matter how powerful your tools are, you still have to be responsible. Due diligence does sometimes negate the benefit of having a powerful tool, though. Just because the tool lets you ignore some things does not mean that you can actually afford to ignore those things.

What's much more hazardous, IMO, is developers running arbitrary code from remote sources locally on dev machines with minimal or no isolation at all. Make one typo to `npx` and you might be screwed. Sure, it's not running as root... but it would be easy enough for it to poison your shell environment and catch you out the next time you try to run sudo, or do any number of other things like steal or use local AWS/Kubernetes/etc. credentials.

But I will agree with you on one thing: you should be weary of trusting random Docker images. They're not necessarily secure. You should vet the images you use. Ideally, those images should be as minimal as possible. If at all possible, they should be a single statically-linked binary. If not that, then certainly the absolute bare minimum Alpine userland.

...But, despite that, I think there's so much unneeded fervor over out-of-date Docker images. Part of it is because people want to sell their products, and boy it's so easy to scan a Docker layer and go "See, look! It has this old vulnerability!" Which, sure, I don't disagree that it's bad that ancient bugs are circulating in out-of-date Docker layers, it's a problem that can and should be fixed in most cases. But, honestly, the vast majority of "vulnerability" reports on Docker layers are useless noise. Even this one: XZ Utils backdoor? Docker containers are not virtual machines. I do admit that there are some Docker images that run OpenSSH servers, but it's pretty rare. The gold standard is for a Docker container to map to a single process. You don't really need an OpenSSH server in a Docker container most of the time, unless you're trying to do something specific that needs it (accept Git SSH connections maybe?) And if you're not running OpenSSH in the container, or even sillier, if you don't even have OpenSSH installed, the vulnerability is useless dead code. Anything you could possibly do to the container to exploit this vulnerability wouldn't do any good, because if you have the ability to go and start an OpenSSH server, you don't need the XZ backdoor anymore anyways.

This is the reality of most Docker "vulnerabilities". They're not all non-sense, but most of them are just, not real issues that can actually have any impact.

Still, I think the way most Docker images are built and distributed could be improved.

If we're going to use Dockerfiles, then I think a reproducible system like StageX[1] is much more compelling than starting with `FROM debian:...` in most cases. For many simple cases, like with Go, it's sufficient to just build the binary in one container and then chuck it into a `FROM scratch`.

(Side note: recently, I realized that you can entirely avoid copying anything from Alpine these days for Go binaries. You can embed tzdata into Go binaries with an unnamed import to "time/tzdata" now, or by building with -tags timetzdata. Another enterprising person made a similar package for the Mozilla root certificates, and they bothered to make it automatically update, so you can also use that: github.com/breml/rootcerts - and then you don't need to do any funny business, and have a Go binary that "just works" in FROM scratch.)

But we don't really need to use Dockerfiles; a lot of tools can build OCI images. I sometimes build OCI containers using Nix. In this case, there's basically no practical disadvantage of using Docker, it's just another way to deploy and schedule some stuff that is constructed with Nix.

And if I can just build the software with Nix, I could always just schedule it with Nix, so the obvious question remains; why even bother with Docker? Again, it's because Docker/Podman/Kubernetes offer really useful capabilities. None of the capabilities that any of these tools offer is impossible to get elsewhere, but for example... Having reliable, distributed cronjob scheduling across multiple machines is not typically a trivial endeavor, but it's out-of-the-box on Kubernetes, including scheduling constraints that either control the distribution of load or ensure that workloads run on machines suited to their characteristics and needs, or that they're not scheduled alongside certain other workloads, or what have you.

And due to their nature, OCI containers are very flexible. Is Docker's use of Linux namespacing just not enough? No problem: you can get better isolation in multiple different ways. You can use gVisor[2] as a container runtime, giving you a much more proper sandbox, or you can make use of hardware virtualization features using something like firecracker-containerd[3]. Same OCI image, stronger isolation guarantees; and you get multiple choices for what approach to take for that, too.

Or you can ignore all this crap, ban Docker and use oldschool VMs, of course, but don't forget to `apt update && apt dist-upgrade` and upgrade to the latest Debian/Ubuntu/whatever on each machine every so often, otherwise you're literally right back at the same problem as with the Docker images... You still have to update the software for it to be updated, after all. (You can do unattended upgrades, too, but nothing stops you from doing the same sort of thing with Docker/Kubernetes, if you like to live dangerously.)

Personally though, if I was going to ban anything company-wide, my first choice would be the practice of running dev tools locally with no isolation. You know, there's that Docker tool you can use to do some lightweight local isolation...

[1]: https://stagex.tools/

[2]: https://gvisor.dev/

[3]: https://github.com/firecracker-microvm/firecracker-container...