Kubernetes egress control with squid proxy

▲ Kubernetes egress control with squid proxy(interlaye.red)

67 points by fsmunoz 9 hours ago | 36 comments

▲ merpkz 8 hours ago | parent | next [-]

You don't need a sidecar to stream logs of squid, that's anti-pattern, instead just tell squid to write logs to /dev/stdout, like this:

  logfile_rotate 0
  cache_log stdio:/dev/stdout
  access_log stdio:/dev/stdout
  cache_store_log stdio:/dev/stdout

Running squid in container is a bit tricky, since it is indeed an ancient piece of software, but I have managed to run it successfully before with squid configuration like this:

  max_filedescriptors 1048576
  pid_filename /dev/shm/squid.pid
  cache_effective_user squid
  cache_effective_group squid

and deployment has these set, - UID 31 is squid user inside of container

  securityContext:
  runAsUser: 31
  runAsGroup: 31
  fsGroup: 31
  command: ["sh","-c","squid -z && sleep 3s; squid -N"]

▲

fsmunoz 7 hours ago | parent | next [-]

That's a more elegant approach. I usually just plow through obstacles, and the end result is not always ideal -- I like your approach better than the sidecar, I guess that I was using sidecars for other things and it sort of influenced my approach.

I'll try it your suggestions out and update the article, and thank you for your comment, already made sharing this worth it.

	▲	merpkz 7 hours ago \| parent [-]
		Don't even mention it, I have never used NetworkPolicy before, but now it seems like exactly the thing I am missing on my clusters to limit the blast radius if anything gets owned. It's quite incredible the amount of nftables firewall rules the k3s daemon just created for that example policy in your blog, now I am in rabbit hole trying to figure out how this all actually works under the hood. Thanks for this writeup!

▲

MrDarcy 6 hours ago | parent | prev [-]

What is the purpose of putting the pid file into /dev/shm ? I’ve never seen that before and am curious to learn more about the technique.

▲

merpkz 26 minutes ago | parent | next [-]

None that I can remember, I was probably just testing something outside container and left it like that. Now checking there is /run/squid created by Alpine so that could be used too.

▲

parliament32 3 hours ago | parent | prev | next [-]

It ensures that if another process is spawned, it knows there's already a running process and refuses to run. An old school leader-election lease, in a sense. It's not necessary in a containerized (read: non-daemonized) environment.

▲

chuckadams 5 hours ago | parent | prev [-]

Files in /dev/shm go away on reboot. Using a PID file at all in kubernetes is kind of odd (containerized things tend to run in the foreground as PID 1), but given squid's age, I imagine it requires it.

	▲	xorcist 5 hours ago \| parent [-]
		Running squid in the foreground is "-N". It's not hard to find, there is a manpage and everything (ooh, ancient).

▲ kodama-lens 6 hours ago | parent | prev | next [-]

Thanks for the write up. It is indeed a simple and good solution for smaller workloads and as already pointed out it has some limitations. For devs the explicit configuration of that HTTP_PROXY is annoying, so the last time I did an egress proxy on OpenShift I wrote a small mutating webhook that injects that envs automatically in all pods. OpenShift does this already automatically but only for some system pods. Right now I explore Cilium's Egress-Gateway since this also handles none HTTP connections and is directly within the routing layer, but it has a learning curve

▲ baobun 8 hours ago | parent | prev | next [-]

Not just squid but mostly any http proxy can be run in forward mode if you want.

Caddys "magic TLS" can be neat for this if you actually do want to dynamically intercept those https connections in an easy way. It's a use-case where Caddy really shines. You can go nuts trying to configure that cleanly in squid. The docs (perhaps intentionally) make you work for the hidden knowledge of these dark arts. You also get modernities like builtin http2, http3, etc.

Nobody else bothered by squids very lengthy restart time or have I just never configured it properly?

(Not to dunk on squid, it's otherwise mostly great. Especially for its caching features)

	▲	fsmunoz 7 hours ago \| parent [-]
		I've used Caddy for some of my projects (e.g. https://github.com/fsmunoz/parlamentodb/blob/54e0b252485905e... ), but not for this intercept approach you mentioned, I will give it a look! I'm not bothered by restart times but that's mostly because that has never been a priority... but one thing I have half-done is a controller that gathers per-namespace configs, and with that reload times will become more of an issue. Part of the reason I chose Squid here was precisely because I found it interesting to reuse something that was such a staple of web architecture patterns.

▲ btreecat 8 hours ago | parent | prev | next [-]

I like this approach!

I am struggling to lock down a pod in my home cluster to allow local connections to it's web UI but force all other connections through a VPN client. I'm going to investigate if I could use squid for this.

My next approach is going to involve using a sidecar.

One heads up to the author, the text based charts didn't render well on FF mobile. Text is meant to reflow based on screen size, typeface etc. I feel this is a great case for using a drawing/image instead.

▲

fsmunoz 6 hours ago | parent | next [-]

Thank you!

Depending on what want for "lock down", this or something like this could work: you are essentially defining a single outbound communication path. In a way, your scenario was one of the reasons behind this experiment.

I'll take a look a the overflow thing, although I'm not sure if I will be able to fix it: I do have an image at the start which is an alternative to the text-based drawing, so nothing is lost. I use my own blogging solution that is essentially Texinfo (https://interlaye.red/Texiblog.html) so these blocks are the result of using an @example block (which is then converted into a preformatted block). I'm not sure this can be improved, apart from (as you said) using alternative images.

▲

hosh 4 hours ago | parent | prev | next [-]

I'm not sure I understand the issue.

Wouldn't the pattern be to use a reverse proxy for ingress and everything goes through there into the backends? Keep the pod ips range that is not directly reachable from outside the cluster?

	▲	btreecat an hour ago \| parent [-]
		> I'm not sure I understand the issue. > > Wouldn't the pattern be to use a reverse proxy for ingress and everything goes through there into the backends? Keep the pod ips range that is not directly reachable from outside the cluster? If all connections were inbound that would work fine. I'm trying to control traffic flowing in and out.

▲

baobun 7 hours ago | parent | prev | next [-]

Using an http proxy like squid (or apache/haproxy/caddy/envoy/trafficserver/freenginx) does sound like what you should do next.

If you need the pod to do outbound connections as well as receive incoming traffic, usually that would be two different proxies (forward and reverse, respectively). Unless you do some fancy p2p service mesh.

▲

brynx97 8 hours ago | parent | prev [-]

I had challenges with split-DNS in my homelab k3s cluster trying to do this. I ended up just putting the apps in docker-compose on a VM that has static routes for my local homelab networks. I looked at tailscale to solve this since it has a kubernetes operator, but tailscale doesn't fit my use cases or work well with all of my devices.

▲

btreecat 8 hours ago | parent [-]

> I had challenges with split-DNS in my homelab k3s cluster trying to do this. I ended up just putting the apps in docker-compose on a VM that has static routes for my local homelab networks. I looked at tailscale to solve this since it has a kubernetes operator, but tailscale doesn't fit my use cases or work well with all of my devices.

I don't need tails scale for this, seems like overkill.

I would like to better understand why my combination of marked packets and SOCK5 proxy are not fully working for certain UDP traffic. I also need to investigate if disabling ipv6 will help.

Using a VM or docker compose when I have k3s feels like admitting defeat with out understanding why.

▲

brynx97 3 hours ago | parent | next [-]

To each their own. I mostly figured out why, and I did not want to create too much tech debt in my homelab with brittle split-DNS and PostUp/PostUp wireguard configurations. I already had ansible and templates setup to move back to the VM and docker-compose. I did learn a fair bit on CoreDNS, so that was a worthwhile experiment.

	▲	btreecat an hour ago \| parent [-]
		I didn't mean for you, I meant for me. I have truenas providing storage to my cluster but can easily just run a VM there. I think you're approach is absolutely valid and didn't mean to seem like I was dismissive. Apologies.

▲

baobun 7 hours ago | parent | prev [-]

> I would like to better understand why my combination of marked packets and SOCK5 proxy are not fully working for certain UDP traffic

I think UDP support for SOCKS5 proxies and clients is very spotty, especially beyond DNS. Probably some bugs out there. That might go for UDP in more or less esoteric container networking setups too...

If everything else fails, I've had the least hassle with socat, as well as just chucking workloads in full vm (if in container with --network=host) and using ip routes and policies.

▲ crimsonnoodle58 8 hours ago | parent | prev | next [-]

We use squid for egress control on Kubernetes and have also written a controller that runs in a sidecar container next to squid that monitors for custom CRD's, such as a whitelists.

The controller then updates squid.conf and reloads squid. This allows pods/namespaces to define their own whitelists.

The great thing about using squid and disabling DNS is you can stop DNS and HTTP exfil, but still allow certain websites to be accessible.

	▲	fsmunoz 7 hours ago \| parent [-]
		I guess you have just described what I was hinting at here: >Linked with several of the above (mainly the centralised configuration) is that when using ACL rules to limit communication to external domains, these are cumulative: all namespaces will be able to communicate with all whitelisted domains, even if they only need to communicate with some of them. > These limitations point toward why more sophisticated solutions exist, after all; a follow-up article will explore using Squid’s include directive to enable per-namespace configuration, and in doing so, show why you’d eventually want a controller or operator to manage the complexity. ... which is actually a good thing. More than making something "new", it's great to hear that the overall approach is sound.

▲ elnerd 2 hours ago | parent | prev | next [-]

Would it be be trivial to have a init container to do CA injection? Maybe though mutating admission controller? Then some CNI magic to redirect outbound traffic to do transparent proxying?

▲ thewisenerd 4 hours ago | parent | prev | next [-]

one of the non-intrusive approaches i have for this [1] is kubenetmon[2] which uses a kernel feature called nf_conntrack_acct to have counters for (src, dst).

it's not perfect [3] but gets the job done for me

[1] not as much "control" as it is "logging", of sorts; "especially when you just need to answer “what is my cluster talking to?”"

[2] https://github.com/ClickHouse/kubenetmon / https://clickhouse.com/blog/kubenetmon-open-sourced

[3] if you have a lot of short-lived containers, you're likely to run into something like this: https://github.com/ClickHouse/kubenetmon/issues/24

edit: clarifying [1]

▲ lkglglgllm 3 hours ago | parent | prev | next [-]

How hard would this be in nginx/traefik/envoy/caddy/river/varnish?

	▲	fsmunoz 2 hours ago \| parent [-]
		Depends on what you want, I touch upon that somewhat: to replicate this specific pattern, you can replace Squid with something that fills in the gap without any major changes - so, nginx or Caddy for example -- but you would have to make sure the feature set is adequate: I see Squid as being egress-first, where others are ingress-first (nginx being used a an ingress controller, recently discontinued but still...), so I do think that for this specific purpose it works quite well. As for Envoy and others, I think this would fit in a different architecture that I sort of point to near the end, one that includes using a service mesh: Istio for example uses Envoy for Egress Gateway, Cillium also has an Egress Gateway, etc. This to me would be a separate pattern though.

▲ oldsj 8 hours ago | parent | prev | next [-]

I’ve been working on running agents (Claude agent sdk) on k8s this looks great to control their egress

	▲	fsmunoz 7 hours ago \| parent [-]
		You can certainly use the Squid ACLs to limit the egress for agents. One of the current shortcomings (I explicitly mentioned it near the end) is that there's no per-namespace granularity, so you wouldn't be able to determine it on a per-agent level -- but you would be able to generally establish that all agents would only have access to a global whitelist.

▲ e-Minguez 7 hours ago | parent | prev | next [-]

This is great! The only downside is that the app needs to understand proxies.

	▲	fsmunoz 6 hours ago \| parent \| next [-]
		Yes! And this can be partially a limitation that helps, in the sense that it forces you to add that. In this example, I had to spent some time with the Common Lisp dexador approach to make it work. I've added a "PROXY: " UI hint in the page at https://horizons.interlaye.red/ , you will see that it says "-- PROXY: http://squid.egress-proxy.svc.cluster.local:3128 --". This was actually something from my debugging that I decided to keep. A next article will likely address this limitation though, and look into transparent proxying. This will involve nftables, sidecars, etc, and the more we go into this direction, the more installing a CNI that comes with this by default starts to make sense.
	▲	hosh 4 hours ago \| parent \| prev [-]
		The older versions of Istio uses an init container to redirect inbound and outbound traffic from the main container to the Envoy sidecar. You still have to have some kind of admissions hook to inject things if you want it automatic, but the apps don’t need to understand proxies.

▲ m1keil 8 hours ago | parent | prev | next [-]

Pragmatic and practical. I learned something, thanks.

	▲	fsmunoz 7 hours ago \| parent [-]
		You are most welcome, and that was precisely what I aimed at. Thank you.

▲ jonstewart 5 hours ago | parent | prev | next [-]

My team uses a squid proxy to control egress for AWS VPCs, all integrated into our CDK scripts. The CDK script states the allowlist (including AWS endpoints) for the VPC, and squid enforces it, including DNS. It works beautifully well. Locking down egress is one of the best defense in depth measures, as it makes it difficult for threat actors to download their tools and talk to their C2.

▲ klohto 6 hours ago | parent | prev [-]

I have had great experience scripting and running http://mitmproxy.org for these purposes. I also have set it in production as a dumb caching proxy for upstream services (We do a lot dumb GETs to list/enumerate)