| ▲ | zamadatix 8 months ago | |||||||||||||||||||||||||||||||||||||||||||
> Your first point appears to be about physical layer concerns. My suggestion was not meant to operate at that level. The proposal doesn't operate at that level but it must be compatible with the operations of that level. I.e. that the physical layer can also cause truncation of layers riding on top of it needs to be accounted for in the way those upper layers consider what truncation means. The same is true for possible intermediate layers (which sorta aligns with the later conversations regarding tunnels, which are basically just more complicated forms of intermediate layers). > The proposed model assumes the physical layer guarantees point-point delivery of a distinct packet between adjacent nodes in the network with MTU limits manifesting as either discarding or rejecting the trailing portions of the packet. Then proposed isn't applicable to IP since an upper layer protocol cannot make guarantees about the behavior of lower level protocols it may be transported on. In addition, discarding trailing portions of the packet still results in the aforementioned problems with consistency checks and forwarding behavior limitations for lower level layers which did abide by this behavior. > Unidirectional protocols with no back channel One cannot guarantee bidirectional protocols will be able/allowed to form a back channel either, I just used unidirectional as a more clear-cut example. > You can just not use it and act as if truncation is dropping if you want to. This is just strictly more data you can use for decisions. Well sure, the same is true of the ICMP method or an active probing method. The concern is less with sessions you don't care to PMTUD in the first place and more with how the truncation design affects the designs of such other use cases. > You can get still get authenticated transport in the presence of truncation if your protocol generates a authentication tag for the “original” length and puts it at the start of the message. Then you can authenticate the length field and verify truncation otherwise you can drop it. I totally agree one can include an HMAC tag in your client<->client protocol to validate unmodified packets are authentic. This is regardless of whether truncation, ICMP packet too big, active PMTUD probing, or any other method is in place as, to this point, this is only about validating delivered packets which did fit in the MTU. What isn't clicking is when a truncated message arrives how a (now invalid) HMAC helps you authenticate if this packet was completely spoofed by a malicious actor or really truncated by a middlebox. All you know is it was supposed to be longer and now something claims it needs to be shorter, how do you know that's not because of the same malicious actor who was supposed to be sending the fake ICMP packet too big rather than a middlebox really trying to signal the packet truly needed to be truncated? > I did not bother with tunnels because I do not see how it is a distinct problem. As highlighted earlier, tunnels may either encapsulate other protocols or encapsulate protocols which are expecting truncation. If the only things which existed in the world were client network interfaces it wouldn't be a problem, once more network devices become involved then you have to consider the impact on those too. The main thing to keep in mind is very few network middleboxes or tunnel protocols have the ability to do fragmentation on behalf of tunneled data, particularly if they are hardware based or based on protocols without such a feature (such as Ethernet) since this eats up TONS of hardware to do so (especially at high speeds). E.g. take an IPv6 VXLAN tunnel of an Ethernet frame on a 400 Gbps interface, how is an pure L3 intermediate carrier router doing truncation supposed to know not to update the UDP (a layer up the stack) checksum so the truncated Ethernet payload actually gets delivered to the client destination from the egress VTEP? It's not even that the egress VTEP needs some way to signal to the ingress VTEP how much the truncation was, it's that the original client which was VXLAN encapsulated by the ingressing VTEP needs its packet delivered to the remote client so the remote client can see the truncation and re-negotiate (in band or out of band) with the client to send smaller frames. This signaling will not occur because of the aforementioned UDP checksum being broken by an intermediate router. Just removing all checksums and allowing all modifications to headers and delivering whatever arrives would create not only high incidences of the propagation of deformed traffic but also security risks. This brings us back to the example of secure tunnels, like IPsec, which have the same problem but in a much more succinct form. All parts of the payload of an IPsec tunnel are basically random noise after you truncate it, so there is no way to even attempt to consider sending the truncated payload to the intended destination. It's not the responsibility of the IPsec encapsulator to perform the encapsulation and the IPsec receiver usually doesn't have a path to communicate with the original client (not that it even knows who that is). If you redesign everything about how network tunneling works under some severe limitations and assumptions then it may be possible to solve some (or maybe all if I can figure out what I'm missing regarding authentication of packets claiming MTU changes) of these problems but I'm not sure I could ever see the set of requirements needed as easier than the other MTU approaches. That doesn't necessarily mean I think there is an overall perfect answer all, just that I think PMTUD and its variants are definitely the easier path. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Veserv 8 months ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||
I just do not understand the problems you are stating. Let me present a concrete example. We have A <-> B <-> C. A wishes to transmit a packet of 0x1000 bytes containing a Ethernet, IPv4, and then bespoke protocol, P, which is a header containing a length, MAC on the length + header, MAC on entire packet, encrypted payload, in that order. A then prepares transmit descriptors pointing at the packet and with size 0x1000 bytes. C prepares receive descriptors pointing to buffers with a maximum capacity of size 0x1000 bytes per packet. B prepares receive descriptors pointing to buffers with a maximum capacity of size 0x500 (1280) bytes per packet. A transmits the packet to B. The physical coding layer transmits the bytes terminating in the FCS. B receives bytes and does a running computation of the FCS. Upon reaching 0x500 bytes, it stops storing data into memory, stores the current FCS into memory, then continues receiving the data and computing the FCS until the data stream ends. Upon determining that the FCS matches, it marks the descriptor as valid for consumption and stores that the descriptor contains 0x500 bytes of data. The transmit engine of B then configures a transmit descriptor pointing at the packet and with size 0x500 bytes. C then receives the 0x500 byte packet from B and observes that the FCS matches the 0x500 byte FCS and marks the descriptor as valid for consumption and stores that the descriptor contains 0x500 bytes of data. C then processes the packet observing that the P header indicates a length of 0x1000 bytes, but only 0x500 bytes are available. It attempts to authenticate the P header MAC using a secret known only to A and C. As the truncation only hit the encrypted payload at the tail, the P header MAC and the header data it is authenticating have not been modified by the truncation process. As such, C is able use the higher layer secret it shares with A to successfully authenticate the header data and determine that the header containing a length field with the value of 0x1000 bytes could have only been written by A and has not been tampered with. It then rejects the rest of the packet, but stores that the inbound MTU is only 0x500 bytes. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||