Remix.run Logo
Why your 'Private Google Access enabled' subnet still bills Cloud NAT(github.com)
3 points by hsin003 10 hours ago | 1 comments
hsin003 10 hours ago | parent [-]

We hit a Cloud NAT bill of ~$4,500/month (3.2 TiB/day at $0.045/GiB) on a project where we'd "enabled Private Google Access" on the subnet. The traffic was inference workloads pulling data from GCS — exactly what PGA is supposed to put on the private path for free.

The host is a multi-tenant LXC setup (Containarium — open source, many isolated user containers on one VM behind a shared NAT IP). The cost-saving reason we run it that way is exactly the same reason this incident was sneaky: ~20 workloads sharing one egress IP means VPC flow logs and Cloud NAT metrics all point at the same place, with no native GCP way to attribute traffic per workload.

Where the platform earned its keep: it tracks per-container resource stats (CPU/memory/disk/network via veth counters) and exposes them in a web UI. We sorted containers by lifetime tx bytes, and the offender jumped out — xxx-container, 1.5 TB rx since boot. Two orders of magnitude above any other container.

Without per-container traffic accounting, we'd have been left correlating VPC flow log timestamps against ps and lsof on the host — the kind of investigation that takes a day, not five minutes.

Then we looked at the destination IP from VPC flow logs: 192.178.163.207 → tt-in-f207.1e100.net. Google-owned. That misled us at first: it looked like PGA was working, just to a different Google service. It wasn't.

The actual problem: PGA has two halves and GCP only surfaces one of them in the subnet UI.

1. Subnet flag — --enable-private-ip-google-access. We had this on. 2. DNS — *.googleapis.com has to resolve to the private VIP range 199.36.153.8/30 (or 199.36.153.4/30 for restricted). Without that, storage.googleapis.com resolves to a normal public Google IP, the route to PGA never gets used, and Cloud NAT processes every byte.

The fix is a Cloud DNS private zone for googleapis.com attached to the VPC, with A and AAAA records pointing at the private VIP range (don't forget IPv6 — we hit that on attempt one and saw traffic go right back out the public path). Once that's in place, dig storage.googleapis.com from inside the VPC returns 199.36.153.11, traffic uses the private path, NAT bytes drop ~95%.

Verification one-liner from a VM in the VPC:

$ dig +short storage.googleapis.com 199.36.153.11 # private VIP — good # vs. 142.251.x.x # public IP — your "PGA-enabled" subnet is doing nothing

The annoying part is that everything else looks correct. Subnet flag set, no external IP on the VM, destination is a Google IP, NAT gateway is healthy. There's no warning anywhere that DNS is the missing piece.

Worth checking on every VPC where you assumed PGA was doing its job.