| ▲ | whiatp 8 months ago | |
PMTU just doesn't feel reliable to me because of poorly behaved boxes in the middle. The worst offender I've had to deal with was AWS Transit Gateway, which just doesn't bother sending ICMP too big messages. The second worst offender is, IMO (data center and ISP) routers that generate ICMP replies in their CPU, meaning large packets hit a rate limited exception punt path out of the switch ASIC over to the cheapest CPU they could find to put in the box. If too many people are hitting that path at the same time, (maybe) no reply for you. More rare cases, but really frustrating to debug was when we had an L2 switch in the path with lower MTU than the routers it was joining together. Without an IP level stack, there is no generation of ICMP messages and that thing just ate larger packets. The even stranger case was when there was a Linux box doing forwarding that had segment offload left on. It was taking in several 1500 byte TCP packets from one side, smashing them into ~9000 byte monsters, and then tried to send those over a VPNish network interface that absolutely couldn't handle that. Even if the network in the middle bothered to generate the ICMP too big message, the source would have been thoroughly confused because it never sent anything over 1500. | ||
| ▲ | toast0 8 months ago | parent | next [-] | |
> The even stranger case was when there was a Linux box doing forwarding that had segment offload left on. It was taking in several 1500 byte TCP packets from one side, smashing them into ~9000 byte monsters, and then tried to send those over a VPNish network interface that absolutely couldn't handle that. Even if the network in the middle bothered to generate the ICMP too big message, the source would have been thoroughly confused because it never sent anything over 1500. This is an old Linux tcp offloading bug; large receive offload smooshes the inbound packet, then it's too big to forward. I had to track down the other side of this. FreeBSD used to resend the whole send queue if it got a too big message, even if the size did not change. Sending all at once made it pretty likely for the broken forwarder to get packets close enough to do LRO, which resulted in large enough packet sending to show up as network problems. I don't remember where the forwarder seemed to be, somewhere far away, IIRC. | ||
| ▲ | cryptonector 8 months ago | parent | prev | next [-] | |
> PMTU just doesn't feel reliable to me because of poorly behaved boxes in the middle. The worst offender I've had to deal with was AWS Transit Gateway, which just doesn't bother sending ICMP too big messages. Passive PMTUD does NOT depend on ICMP messages. | ||
| ▲ | immibis 8 months ago | parent | prev | next [-] | |
L2 not generating errors is expected behaviour - all ports on the L2 network are supposed to have the same MTU set | ||
| ▲ | Hikikomori 8 months ago | parent | prev [-] | |
They recently started supporting pmtud on tgw. But it wasn't a big deal really as it adjusted mss instead. | ||