The big things to avoid are crossing the user/kernel divide and communication across cores.

Staying in the kernel is approximately the same as bypassing the kernel (caveats apply); for a packet filtering / smoothing use case, I don't think kernel bypass is needed. You probably want to tune NIC hashing so that inbound traffic for a given shaping queue arrives in the same NIC rx queue; but you probably want that in a kernel bypass case as well. Userspace is certainly nicer during development, as it's easier to push changes, but in 2026, it feels like traffic shaping has pretty static requirements and letting the kernel do all the work feels reasonable to me.

Otoh, OpenBSD is pretty far behind the curve on SMP and all that (I think their PF now has support for SMP, but maybe it's still in development?; I'd bet there's lots of room to reduce cross core communication as well, but I haven't examined it). You can't pin userspace cores to cpus, I doubt their kernel datastructures are built to reduce communications, etc. Kernel bypass won't help as much as you would hope, if it's available, which it might not be, because you can't control the userspace to limit cross core communications.

▲

rpcope1 4 hours ago | parent [-]

Just a single data point, but the BSDs in general, as much as people like to jerk them off, having tested both recent FreeBSD (which should be much faster than OpenBSD) and Debian on I guess the now kind of elderly APU2s I have, netfilter is noticably faster (and I find nftables to be frankly less challenging than pf) and gets those devices right at gigabit line speed even with complex firewall rules, where as pf leaves performance on the table. It probably has to do with the fact it's an older 4 core design that wasn't super high power to begin with (does still does its job extremely well), but still.

	▲	toast0 3 hours ago \| parent [-]
		One issue I've seen from a fair number of people on the APU2s running FreeBSD is if they've got PPPoE; inbound traffic (at least) all hashes to the same RX queue, and as a result there's no parallelism... if you're on gigE fiber with PPPoE, the APU2 can't really keep up single threaded. The earlier APU (1) boards use realtek nics that I think only have a single queue, so you won't get effective parallelism there either. If I'm finding the right information, APU2s with i210 have 4 rx queues which is well matched with a quad core, but those with i211 only have 2 rx queues, which means half of the processors will have nothing to do unless your kernel redistributes packets after rxing, but that comes at a cost too. Linux may have a different packet flow, or netfilter could be faster than pf. > I find nftables to be frankly less challenging than pf I also don't really care for how pf specifies rules. I would rather run ipfw, but pf has pfsync whereas ipfw doesn't have a way to do failover with state synchronization for stateful firewalls/NAT. So I figured out how to express my rules in pf.conf; because it was worth it, even if I don't like it :P