| ▲ | zamadatix 8 months ago | ||||||||||||||||||||||||||||||||||
Most carrier/enterprise/hardware IPv4 routers, particular those on the internet, will not actually perform IPv4 fragmentation on behalf of the client traffic even though it's allowed by the IPv4 standard. Typically fragmentation is reserved for boxes which already have another reason to care about it (such as needing to NAT or inspect the packets) or the client endpoints themselves. I.e. the internet will (sparing security middleboxes) allow arbitrary IPv4 fragments through but it won't typically turn a 8000 byte packet into 6 fragments to fit through a 1500 byte MTU limitation on behalf of the clients. E.g. if you send a 1500 byte IPv4 ping without DF set to a cellular modem or someone with a DSL modem using PPPoE it'll almost always get dropped by the carrier rather than fragmented. Of course nothing is stopping you from labbing it up at home. Firewalls and software routers can usually be made to do refragmentation. | |||||||||||||||||||||||||||||||||||
| ▲ | ay 8 months ago | parent [-] | ||||||||||||||||||||||||||||||||||
Of course on the carrier boxes the fragmentation is done also not inline, so its behavior will depend on the aggressiveness of the CoPP configuration, and will be subject to the same pitfalls as the ICMP packet too big generation. Thanks for keeping me straight here! Based on the admittedly old study at [0] seems like some carriers just don’t bother to fragment, indeed - but by far not all of them. Firewalls might do virtual reassembly, so the trick with the initial fragment won’t fly there. This MTU subject is interesting for me because I have a little work in progress experiment: https://gerrit.fd.io/r/c/vpp/+/41914/1/src/plugins/pvti/pvti... (the code itself is already in, but has a few crashy bugs still and I need to take make it not suck performance wise, but that is my attempt to revisit the issue of MTU for tunnel use case. The thesis is that keeping the 5-tuple will make “chunking”/“de-chunking” at tunnel endpoints much much simpler on the endpoints of the tunnel. The source of inspiration was a very practical setup at [1], which is, while looking horrible in theory (locally fragmented GRE over L2TP), actually gives a decent performance with 1500-byte end to end MTU over the tunnel. The open question is which inner MTU will be sane, taking into account the increased probability of loss with bigger inner MTU… intuitively seems like something like ~2.5K should just double the loss probability (because it’s 2x packets) and might be a workable compromise in 2025…. One could also do the same trick over QUIC, of course, but i wanted something tiny and easier to experiment with - and the ability to go over IPSec or wireguard as well as a secured underlay. [0] https://labs.ripe.net/author/emileaben/ripe-atlas-packet-siz... | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||