Remix.run Logo
eyberg 16 hours ago

Containers got popular at at time when there were an increasingly number of people that were finding it hard to install software on their system locally - especially if you were, for instance, having to juggle multiple versions of ruby or multiple versions of python and those linked to various major versions of c libraries.

Unfortunately containers have always had an absolutely horrendous security story and they degrade performance by quite a lot.

The hypervisor is not going away anytime soon - it is what the entire public cloud is built on.

While you are correct that containers do add more layers - unikernels go the opposite direction and actively remove those layers. Also, imo the "attack surface" is by far the smallest security benefit - other architectural concepts such as the complete lack of an interactive userland is far more beneficial when you consider what an attacker actually wants to do after landing on your box. (eg: run their software)

When you deploy to AWS you have two layers of linux - one that AWS runs and one that you run - but you don't really need that second layer and you can have much faster/safer software without it.

m132 15 hours ago | parent | next [-]

I can understand the public cloud argument; if the cloud provider insists on you delivering an entire operating system to run your workloads, a unikernel indeed slashes the amount of layers you have to care about.

Suppose you control the entire stack though, from the bare metal up. (Correct me if I'm wrong, but) Toro doesn't seem to run on real hardware, you have to run it atop QEMU or Firecracker. In that case, what difference does it make if your application makes I/O requests through paravirtualized interfaces of the hypervisor or talks directly to the host via system calls? Both ultimately lead to the host OS servicing the request. There isn't any notable difference between the kernel/hypervisor and the user/kernel boundary in modern processors either; most of the time, privilege escalations come from errors in the software running in the privileged modes of the processor.

Technically, in the former case, besides exploiting the application, a hypothetical attacker will also have to exploit a flaw in QEMU to start processes or gain further privileges on the host, but that's just due to a layer of indirection. You can accomplish this without resorting to hardware virtualization. Once in QEMU, the entire assortment of your host's system calls and services is exposed, just as if you ran your code as a regular user space process.

This is the level you want to block exec() and other functionality your application doesn't need at, so that neither QEMU nor your code ran directly can perform anything out of their scope. Adding a layer of indirection while still leaving user/kernel, or unikernel/hypervisor junction points unsupervised will only stop unmotivated attackers looking for low-hanging fruit.

toast0 9 hours ago | parent | next [-]

> Suppose you control the entire stack though, from the bare metal up. (Correct me if I'm wrong, but) Toro doesn't seem to run on real hardware, you have to run it atop QEMU or Firecracker.

Some unikernels are intended to run under a hypervisor or on bare metal. Bare metal means you need some drivers, but if you have a use case for a unikernel on bare metal, you probably don't need to support the vast universe of devices, maybe only a few instances of a couple types of things.

I've got a not production ready at all hobby OS that's adjacent to a unikernel; runs in virtio hypervisors and on bare metal, with support for one NIC. In it's intended hypothetical use, it would boot from PXE, with storage on nodes running a traditional OS, so supporting a handful of NICs would probably be sufficient. Modern NICs tend to be fairly similar in interface, so if the manufacturer provides documentation, it shouldn't take too long to add support at least once you've got one driver doing multiple tx/rx queues and all that jazz... plus or minus optimization.

For storage, you can probably get by with two drivers, one for sata/ahci and one for nvme. And likely reuse an existing filesystem.

eyberg 15 hours ago | parent | prev | next [-]

I can't speak for all the various projects but imo these aren't made for bare metal - if you want true bare metal (metal you can physically touch) use linux.

One of the things that might not be so apparent is that when you deploy these to something like AWS all the users/process mgmt/etc. gets shifted up and out of the instance you control and put into the cloud layer - I feel that would be hard to do with physical boxen cause it becomes a slippery slope of having certain operations (such as updates) needing auth for instance.

laurencerowe 13 hours ago | parent | prev [-]

> In that case, what difference does it make if your application makes I/O requests through paravirtualized interfaces of the hypervisor or talks directly to the host via system calls?

Hypervisors expose a much smaller API surface area to their tenants than an operating system does to its processes which makes them much easier to secure.

Veserv 12 hours ago | parent [-]

That is a artifact of implementation. Monolithic operating systems with tons of shared services expose lots to their tenants. Austere hypervisors, the ones with small API surface areas, basically implement a microkernel interface yet both expose significantly more surface area and offer a significantly worse guest experience than microkernels. That is why high security systems designed for multi-level security for shared tenants that need to protect against state actors use microkernels instead of hypervisors.

j-krieger 13 hours ago | parent | prev | next [-]

> Unfortunately containers have always had an absolutely horrendous security story and they degrade performance by quite a lot.

This is demonstratably untrue.

eyberg 13 hours ago | parent [-]

Let's see last month (November 2025) we had CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881 alone. Container breakouts happen almost monthly.

eikenberry 12 hours ago | parent | next [-]

I think they were talking more about the degraded performance.

In terms of the security aspects though, how does security holes in a layer that restricts things more than without it degrade security? Seems like saying that CVEs on browser's javascript sandboxing degrade the browser security more than just not having sandboxes.

eyberg 12 hours ago | parent [-]

Duplicating a networking and storage layer on top of existing storage/networking layers that containers, and the orchestrators such as k8s provide, absolutely degrade performance - full stop. No one runs containers raw (w/out an underlying vm) in the cloud - they always exist on top of vms.

The problem with "container" security is that even in this thread many people seem to think that it is a security barrier of some kind when it was never designed to be one. The v8 sandbox was specifically created to deal with sandboxing. It still has issues but at least it was thought about and a lot of engineering went into it. Container runtimes are not exported via the kernel. Unshare is not named 'create_container'. A lot of the container issues we see are runtime issues. There are over a half-dozen different namespaces that are used in different manners that expose hard to understand gotchas. The various container runtimes decide themselves how to deal with these and they have to deal with all the issues in their code when using them. A very common bug that these runtimes get hit by are TOCTOU (time of check to time of use) vulns that get exposed in these runtimes.

Right now there is a conversation about the upcoming change to systemd that runs sshd on vsock by default (you literally have to disable it via kernel cli flag - systemd.ssh_auto=no) - guess what one of the concerns is? Vsock isn't bound to a network namespace. This is not itself a vulnerability but it most definitely is going to get taken advantage in the future.

ritcgab 8 hours ago | parent | prev [-]

All specific to runc.

ahepp 12 hours ago | parent | prev | next [-]

> other architectural concepts such as the complete lack of an interactive userland is far more beneficial when you consider what an attacker actually wants to do after landing on your box

What does that have to do with unikernel vs more traditional VMs? You can build a rootfs that doesn't have any interactive userland. Lots of container images do that already.

I am not a security researcher, but I wouldn't think it would be too hard to load your own shell into memory once you get access to it. At least, compared to pulling off an exploit in the first place.

I would think that merging kernel and user address spaces in a unikernel would, if anything, make it more vulnerable than a design using similar kernel options that did not attempt to merge everything into the kernel. Since now every application exploit is a kernel exploit.

eyberg 12 hours ago | parent [-]

A shell by design is explicitly made to run other programs. You type in 'ls', 'cd', 'cat', etc. but those are all different programs. A "webshell" can work to a degree as you could potentially upload files, cat files, write to files, etc. but you aren't running other programs under these conditions - that'd be code you're executing - scripting languages make this vastly easier than compiled ones. It's a lot more than just slapping a heavy-handed seccomp profile on your app.

Also merging the address space is not a necessity. In fact - 64-bit (which is essentially all modern cloud software) mandates virtual memory to begin with and many unikernel projects support elf loading.

pjmlp 16 hours ago | parent | prev | next [-]

Linux containers you mean.

The story is quite different in HP-UX, Aix, Solaris, BSD, Windows, IBM i, z/OS,...

ripdog 15 hours ago | parent [-]

Windows has containers?

m132 15 hours ago | parent | next [-]

Yes.

There are AppContainers. Those have existed for a while and are mostly targeted at developers intending to secure their legacy applications.

https://learn.microsoft.com/en-us/windows/win32/secauthz/app...

There's also Docker for Windows, with native Windows container support. This one is new-ish:

https://learn.microsoft.com/en-us/virtualization/windowscont...

jayd16 15 hours ago | parent [-]

Windows containers are actually quite nice once you get past a few issues. Perf is the biggest as it seems to run in a VM in windows 11.

Perf is much better on Windows server. It's actually really pleasant to get your office appliances (a build agent etc) in a container on a beefy Windows machine running Windows server.

mananaysiempre 13 hours ago | parent [-]

> Perf is the biggest as it seems to run in a VM in windows 11.

Doesn’t “virtualization-based security” mean everything does, container or no? Or are they actually VMs even with VBS disabled?

ironhaven 15 hours ago | parent | prev [-]

With a standard windows server license you are only allowed to have a two hyperv virtual machines but unlimited "windows containers". The design is similar to Linux with namespaces bolted onto the main kernel so they don't provide any better security guaranies than Linux namespaces.

Very useful if you are packaging trusted software don't want to upgrade your windows server license.

pixl97 15 hours ago | parent | prev | next [-]

>what an attacker actually wants to do after landing on your box.

Aren't there ways of overwriting the existing kernel memory/extending it to contain an a new application if an attacker is able to attack the running unikernel?

What protections are provided by the unikernel to prevent this?

eyberg 15 hours ago | parent | next [-]

To be clear there are still numerous attacks one might lob at you. For instance you if you are running a node app and the attacker uploads a new js file that they can have the interpreter execute that's still an issue. However, you won't be able to start running random programs like curling down some cryptominer or something - it'd all need to be contained within that code.

What becomes harder is if you have a binary that forces you to rewrite the program in memory as you suggest. That's where classic page protections come into play such as not exec'ing rodata, not writing to txt, not exec'ing heap/stack, etc. Just to note that not all unikernel projects have this and even if they do it might be trivial to turn them off. The kernel I'm involved with (Nanos) has other features such as 'exec protection' which prevents that app from exec-mapping anything not already explicitly mapped exec.

Running arbitrary programs, which is what a lot of exploit payloads try to achieve, is pretty different than having to stuff whatever they want to run inside the payload itself. For example if you look at most malware it's not just one program that gets ran - it's like 30. Droppers exist solely to load third party programs on compromised systems.

ignoramous 13 hours ago | parent [-]

> The kernel I'm involved with (Nanos) has other features such as 'exec protection' which prevents that app from exec-mapping anything not already explicitly mapped exec.

Does this mean JIT (and I guess most binary instrumentation (debuggers) / virtualization / translation tech) won't run as expected?

eyberg 13 hours ago | parent [-]

We don't enable that exec-protect feature on by default explicitly for this reason. You are right - jit needs it.

wmf 15 hours ago | parent | prev [-]

If the stack and heap are non-executable and page tables can't be modified then it's hard to inject code. Whether unikernels actually apply this hardening is another matter.

catlifeonmars 11 hours ago | parent [-]

Isn’t this where ROP gadgets come in?

wmf 11 hours ago | parent [-]

ASLR defeats ROP. Whether unikernels actually use ASLR is another matter.

dheera 15 hours ago | parent | prev [-]

I always thought of Docker as a "fuck it" solution. It's the epitomy of giving up. Instead of some department at a company releasing a libinference.so.3 and a libinference-3.0.0.x86_64.deb they ship some docker image that does inference and call it a microservice. They write that they launched, get a positive performance review, get promoted, and the Docker containers continue to multiply.

Python package management is a disaster. There should be ways of having multiple versions of a package coexist in /usr/lib/python, nicely organized by package name and version number, and import the exact version your script wants, without containerizing everything.

Electron applications are the other type of "fuck it" solution. There should be ways of writing good-looking native apps in JavaScript without actually embedding a full browser. JavaScript is actually a nice language to write front-ends in.

catlifeonmars 11 hours ago | parent | next [-]

> Python package management is a disaster. There should be ways of having multiple versions of a package coexist in /usr/lib/python, nicely organized by package name and version number, and import the exact version your script wants, without containerizing everything.

Have you tried uv?

dheera 10 hours ago | parent [-]

Well sure, every language has some band-aid. The real solution should have been Python itself supporting:

    import torch==2.9.1
Instead of a bunch of other useless crap additions to the language, this should have been a priority, along with the ability for multiple versions to coexist in PYTHON_PATH.
soulofmischief 15 hours ago | parent | prev | next [-]

There is a vast amount of complexity involved in rolling things from scratch today in this fractured ecosystem and providing the same experience for everyone.

Sometimes, the reduction of development friction is the only reason a product ends up in your hands.

I say this as someone whose professional toolkit includes Docker, Python and Electron; Not necessarily tools of choice, but I'm one guy trying to build a lot of things and life is short. This is not a free lunch and the optimizer within me screams out whenever performance is left on the table, but everything is a tradeoff. And I'm always looking for better tools, and keep my eyes on projects such as Tauri.

ahepp 12 hours ago | parent | prev | next [-]

I think there's merit to your criticisms of the way docker is used, but it also seems like it provides substantial benefits for application developers. They don't need to beg OS maintainers to update the package, and they don't need to maintain builds for different (OS, version) targets any more.

They can just say "here's the source code, here's a container where it works, the rest is the OS maintainer's job, and if Debian users running 10 year old software bug me I'm just gonna tell them to use the container"

dheera 12 hours ago | parent [-]

Yeah I'm not against Docker in its entirety. I think it is good for development purposes to emulate multiple different environments and test things inside them, just not as a way to ship stuff.

nineteen999 15 hours ago | parent | prev | next [-]

Agree on all fronts. The advent of Dockerfiles as a poor mans packaging system and the per-language package managers has set the industry back several years in some areas IMHO.

catlifeonmars 11 hours ago | parent [-]

> and the per-language package managers has set the industry back several years in some areas IMHO

Curious, can you expand on this?

nineteen999 2 hours ago | parent [-]

Python has what, half a dozen mostly incompatible package managers? Node? Ruby? All because they're too lazy, inexperienced or stubborn to write or automate RPM spec files, and/or Debian rules files.

To be fair, the UNIX wars probably inspired this in the first place - outside of SVR4 deriviatives, most commercial UNIX systems (HP-UX, AIX, Tru64) had their own packaging format. Even the gratis BSD systems all have their own variants of the same packaging system. This was the one thing that AT&T and Sun Solaris got right. Linux distros merely followed suit at the time - Redhat with RPM, Debian with DEB, and then Slackware and half a dozen other systems - thankfully we seem to have coalesced on RPM, DEB, Flatpak, Snap, Appimage etc... but yeah that's before you get to the language specific package management. It's a right mess, carried over from 90's UNIX "NIH" syndrome.

fragmede 11 hours ago | parent | prev [-]

> JavaScript is actually a nice language to write front-ends in.

I've written my fair share of GUIs, and React (and thus Javascript) is great compared to, I don't know, PHP, but CSS is the absolute devil.

10 hours ago | parent [-]
[deleted]