Just a note - you can very much limit cpu usage on the docker containers by setting --cpus="0.5" (or cpus:0.5 in docker compose) if you expect it to be a very lightweight container, this isolation can help prevent one roudy container from hitting the rest of the system regardless of whether it's crypto-mining malware, a ddos attempt or a misbehaving service/software.

▲

tracker1 5 days ago | parent | next [-]

Another is running containers in read-only mode, assuming they support this configuration... will minimize a lot of potential attack surface.

▲

3eb7988a1663 4 days ago | parent [-]

Never looked into this. I would expect the majority of images would fail in this configuration. Or am I unduly pessimistic?

▲

hxtk 4 days ago | parent | next [-]

Many fail if you do it without any additional configuration. In Kubernetes you can mostly get around it by mounting `emptyDir` volumes to the specific directories that need to be writable, `/tmp` being a common culprit. If they need to be writable and have content that exists in the base image, you'd usually mount an emptyDir to `/tmp` and copy the content into it in an `initContainer`, then mount the same `emptyDir` volume to the original location in the runtime container.

Unfortunately, there is no way to specify those `emptyDir` volumes as `noexec` [1].

I think the docker equivalent is `--tmpfs` for the `emptyDir` volumes.

1: https://github.com/kubernetes/kubernetes/issues/48912

▲

flowerthoughts 4 days ago | parent | prev | next [-]

Readonly and rootless are my two requirements for Docker containers. Most images can't run readonly because they try to create a user in some startup script. Since I want my UIDs unique to isolate mounted directories, this is meaningless. I end up having to wrap or copy Dockerfiles to make them behave reasonably.

Having such a nice layered buildsystem with mountpoints, I'm amazed Docker made readonly an afterthought.

▲

subscribed 4 days ago | parent [-]

I like steering docker runs with docker-compose, especially with .env files - easy to store in repositories, easy to customise and have sane defaults.

	▲	flowerthoughts 4 days ago \| parent [-]
		Yeah agreed. I use docker-compose. But it doesn't help if the Docker images try to update /etc/passwd, or force a hardcoded UID, or run some install.sh at runtime instead of buildtime.

▲

tracker1 4 days ago | parent | prev | next [-]

It's hit or miss... you sometimes have to make /tmp writable or another data directory... some images just don't operate right because of initialization steps that happen on first run. It's hit or miss and depends... but a lot of your own apps can definitely be made to work with limited, or no write surface.

▲

s_ting765 4 days ago | parent | prev [-]

Depends on specific app use case. Nginx doesn't work with it but valkey will.

▲

freedomben 5 days ago | parent | prev | next [-]

This is true, but it's also easy to set at one point and then later introduce a bursty endpoint that ends up throttled unnecessarily. Always a good idea to be familiar with your app's performance profile but it can be easy to let that get away from you.

▲

moebrowne 4 days ago | parent | prev | next [-]

While this is a good idea I wonder if doing this could allow the intrusion to go undetected for longer - how many people/monitoring systems would notice a small increase in CPU usage compared to all CPUs being maxed out.

▲

miladyincontrol 5 days ago | parent | prev | next [-]

Soft and hard memory limits are worth considering too, regardless of container method.

▲

jakelsaunders94 5 days ago | parent | prev | next [-]

This is a great shout actually. Thanks for pointing it out!

▲

fragmede 5 days ago | parent | prev [-]

The other thing to note is that docker is for the most part, stateless. So if you're running something that has to deal with questionable user input (images and video or more importantly PDFs), is to stick it on its own VM and then cycle the docker container every hour and the VM every 12, and then still be worried about it getting hacked and leaking secrets.

▲

Koffiepoeder 4 days ago | parent | next [-]

If I can get in once, I can do it again an hour later. I'd be inclined to believe that dumb recycling is not very effective against a persistent attacker.

	▲	Saris 3 days ago \| parent [-]
		I wonder if a crypto miner like this was a person doing the work, or just an automated thing someone wrote to scan IPs for known vulnerabilities and exploit them automatically.

▲

tgtweak 4 days ago | parent | prev [-]

Most of this is mitigated by running docker in an LXC containers (like proxmox does) which grants a lot more isolation than docker on it's own - closer in nature to running separate VMs.

▲

butvacuum 4 days ago | parent [-]

Too bad it straight doesn't work without heavy mods in pve9

▲

tgtweak 4 days ago | parent [-]

Illumos had a really nice stack for running containers inside jails and zones... I wonder if any of that ever made it into the linux world. If you broke out of the container you'd just be inside a jail which is even more hardened.

▲

cyphar 3 days ago | parent [-]

SmartOS constructed a container-like environment using LX-branded zones, they didn't create an in-kernel equivalent to Linux's namespaces which it then nested in a zone. You're probably thinking of the KVM port to Solaris/illumos, which does run in a zone internally to provide additional protection.

While LX-branded zones were a really cool tech demo, maintaining compatibility with Linux long-term would be incredibly painful and you're bound to find all sorts of horrific bugs in production. I believe that Oxide uses KVM to run their Linux guests.

Linux has always supported nested namespaces and you can run Docker containers inside LXC (or Incus) fairly easily. Note that while it does add some additional protection (in particular, it transparently adds user namespaces which is a critical security feature most people still do not enable in Docker) it is still the same technology as containers and so kernel bugs still pose a similar risk.

	▲	tgtweak 3 days ago \| parent [-]
		Yes it was SmartOS - bcantrill worked on it post-oracle. I remembered Illumos since it was the precursor.