From the pragmatic standpoint: manually hard coding a safe minimum is the only approach which consistently works.

PMTUD somehow missed that packet networks ditching the OOB mechanisms of circuit switched networks was a good thing. By adding an OOB mechanism of attempted MTU discovery. Unauthenticated.

Yes, matching the 5-tuple from the original payload somewhat helps against the obvious security problem with this. (It was a fun 3-4 years while it was being added to systems across the ‘net while everyone was blocking the ICMP outright to avoid the exploitation. The burps of that one might still find in some security guidelines)

But the number of the network admins who understand what do they have to configure in their ACLs and why, is scarily small compared to the overall pool size.

Here’s another hurdle: for about two decades, to generate ICMP you have to punt the packet from hardware forwarding to the slow path. Which gets rate-limited. Which gives one a fantastic way to create extremely entertaining and hard to debug problems: a single misbehaving or malicious flow can disable the ICMP generation for everyone else.

Make hardware that can do it in fast path ? Even if you don’t punt - you still have to rate-limit to prevent the unauthenticated amplification attack (28 bytes of added headers is not comparable with some of the DNS or NTP scenarios, but not great anyway)

So - practically speaking, it can’t be relied on, other than a source for great stories.

PLPMTUD is a little better, in a sense that it attempts to limit itself to inband probes, but then there is the delicate dance of loss customarily being used to signal the congestion.

So this mechanism isn’t too reliable either, in very painful ways for the poor soul on call dealing with the outcomes. Ask me how I know.. ;-)

Now, let’s add to this the extremely pragmatic and evil hack that is the TCP MSS clamping, coming back from the first PPPoE days; which makes just enough of the eyeball traffic work to make this a “small problem with unimportant traffic that no one cares for anyway”.

So yes, safe minimums are a practical solution.

Until one start to build the tunnels, that is. A wireguard tunnel inside IPSec tunnel. Because policy. Inside VXLAN tunnel inside another IPSec tunnel, because SD-WAN. Which traverses NAT64, because transition and address scarcity.

At which point the previously safe minimums might not be safe anymore and we are back to square 1. I suspect when folks will start running QUIC over wireguard/ipsec/vxlan + IPv6 en masse we will learn that (surprise!) 1200 was not a safe value after all.

So, with this in mind, I posit it’s nice to attempt to at least fantasize about the universe where MTU determination would be done entirely inline, even if hypothetical - if we had the benefit of today’s hindsight and could time travel - could we have made it better ?

P.s. unidirectional protocols could be taken care of by fountain codes not unlike the I-, P- and B- frames in video world, with similar trade offs, moreover, I feel the unequal probability of loss depending on a place in the packet might allow for some interesting tricks.

▲

zamadatix 8 months ago | parent [-]

Agree wholeheartedly on the pragmatic standpoint of just using minimums.

With regard to the problems of out of band signaling in plain PMTUD I fully agree with all your well stated points, doubly so on PLPMTUD! PLPMTUD is my preferred variation of PMTUD and I was glad to see the datagram form utilized in QUIC (especially since it's really a generic secure network tunneling protocol, not just the HTTP variant). I'm also glad QUIC's security model naturally got rid of MSS clamping... it was somewhat pragmatic in one view... but concerning/problematic in others :D. Of course it's not like TCP/mss clamping have exactly gone away though :/.

Also fully agree on both PLPMTUD still not being as reliable/fast as one would like (though I still think it's the best of the options) + safe minimums never seeming to stay "safe". At least IPv6 attempted to hedge this by putting pressure on network admins, saying "everyone is expecting 1280". Of course... we all know that doesn't mean every client ends up with 1280, particularly if they are doing their own VPN tunnel or something, but at least it gives us network guys an extra wall of "well, the standard says we need to allow expectation of 1280 and the rate of bad things which happen will be much higher lower than that".

You seem to have some really neat perspectives on networking, do you mind if I ask about what you do/where you got your experience? I came up through the customer side and eventually over time morphed my way into NOS development at some network OEMs and it feels like I run into fewer and fewer folks who deal with the lower layers of networking as time has went on. I think the most "fun" parts are trying to design overlay/tunneling systems which are hardware compatible with existing ASICs or protocols but are able to squeeze some more cleverness out of the usage (or, as you put it, if we had the benefit of today’s hindsight and could time travel - could we have made it better). The area I'd say I've been least involved in, but would like to, is anything to do with time sensitive networking or lossless ethernet use cases.

▲

ay 8 months ago | parent [-]

> "everyone is expecting 1280"

This works great until there is an app that is expecting 1280 and there is an operator that gives you 1280, and you have to run this app over an encrypted GENÈVE tunnel that attempts to add half a kilobyte of metadata :-). RADIUS with EAP or DHCP with a bunch of options can be a good example of a user app like this. Unfortunately this is a real-world problem.

The smaller mismatch but nonetheless painful is the 20 byte difference between IPv4 and IPv6 header sizes. It trips up every NAT64 deployment.

> where you got your experience?

A long path along all the OSI layers :-). Fiber and UTP networks install between ~95 and 2000. CCIE R&S#5423 in ‘99 and from 2000 almost 10 years in TAC and one of the first CCIE in Europe. Then some years working on IPv6 transition. Large scale IPv6 WiFi. Some folks know me by “happy eyeballs”; some by a “nats are good” YouTube video (scariest thing it’s still funny a decade later). These days - relops at fd.io VPP + internal CI/CD pipeline for a bunch of projects using VPP; and as a side gig - full-cycle automation of the switched fleet (~500 boxes) at #CLEUR installations. One of the recent fun projects was [0] - probably industry first of this scale, for an event network: more than 15K WiFi clients on IPv6Mostly. Though we were benefitting from work of a lot of folks that pushed the standardization and did smaller/more controlled deployments, specifically to shout huge thanks to Jen Linkova and Ondřej Caletka.

If you like low level network stuff, you might like VPP - and given it’s Apache licensed, pretty easy to use it for your own project.

[0] https://www.ietf.org/proceedings/122/slides/slides-122-iepg-...

▲

zamadatix 8 months ago | parent [-]

Agreed, still not perfect by any means.

One minor Ethernet MTU thing I would change with a time machine is to have the network header portion of the MTU be more like 802.11. I.e. instead of sized exactly to the headers of the day it intentionally was larger to allow variation over time. It wouldn't really do anything for most of the MTU concerns discussed here or for clients but I think it would have been helpful for the evolution of wired protocols.

Happy eyeballs! Yes, I loved that one! I was always a huge IPv6 nerd as well, though I didn't get started until shortly after that. The "nats are good" video isn't ringing any bells but if you have a link I'd definitely give it a watch as it sounds right up my humour alley.

Unfortunately all of that Cisco affiliation means we are forever blood enemies and can never speak again... ;). I kid, I came up through the Nortel heritage originally so I'm bound by contract to make such statements.

I've heard great things about the Fast Data Project, I'll definitely have to look into it some before the Oblivion remake comes out :). Maybe after this current project at work I'll finally get to mess with software based dataplanes properly.

It was great running into you here, I hope to catch you around more now that I know to look!

▲

ay 8 months ago | parent [-]

> more like 802.11

L2 is “relatively simple” in a sense that it’s usually under the same administrative control; unlike with L3. And even then, if you have a look at all the complexity between the maintaining the interop in the wireless space… it’s amazing it works as well as it does, with so much functionality being conditional.

> "nats are good" video isn't ringing any bells

https://youtu.be/v26BAlfWBm8?feature=shared - it was a bit of a meme back at the time in making the “X fanboy” videos.

> I came up through the Nortel heritage originally

My networking cradle is Netware 4.1, and in those times it was a zoo of protocols anyway. I really liked conceptually the elegance of Nortel management being SNMP-first. Makes me smile hearing all these “API-first!” claims today.

> It was great running into you here

Indeed, nice to meet you too ! :-)

I do a fair bit of lurking. yesterday was a bit of an anomaly since the whole “truncation as a means to do PMTUD” was a subject of my idle ponder for more than a decade, so it struck the chord :-)

	▲	zamadatix 8 months ago \| parent [-]
		"Go buy some weed and smoke it" LOL