Remix.run Logo
indigodaddy 16 hours ago

By using ping or MTR, they are testing general connectivity to an endpoint, doesn't matter what service is in play. For example, if you are getting significant packet loss on the endpoint itself in the output of an MTR, then that IS indicative of a network/route/connectivity problem, somewhere along the route (could still be an endpoint issue but definitely not always). The service in question doesn't matter much at that point. Whether the service itself is healthy or not, you are still troubleshooting the overarching issue presented by the bad ping/MTR.

op00to 14 hours ago | parent | next [-]

No, ping can be deprioritized while actual interesting packets pass through with much less latency.

esseph 8 hours ago | parent | next [-]

> mtr -u

>

> Use UDP datagrams instead of ICMP ECHO.

Rules out router control plane protection mechanisms specifically for ICMP echo rate limiting.

indigodaddy 13 hours ago | parent | prev [-]

Sure, but if ping is not blocked at the endpoint, and there is not insignificant packet loss directly at the endpoint, then that demonstrates a network issue that needs to be looked at.

gerdesj 16 hours ago | parent | prev [-]

ping and mtr only test one thing. I have saws, drills, routers (lol), planes, screwdrivers, hammers and more in my workshop. To be fair a drill driver and a hammer get a lot of jobs done! However, I will get the impact driver out or a fret saw with a very fine blade as the job requires.

The article here is about a loss of DNS service and proves it with ping. That is wrong and you know it. Diagnosing the fault should involve ping but that is not how you conclusively prove DNS is not working.

To be honest you cannot conclusively prove anything in this game but you can at least explore the problem space effectively from your perspective with whatever you have access to. I happen to have a RIPE ATLAS probe at work with a gigantic amount of credit, so I could probably get that system to test Cluodflare DNS from a lot of locations.

If you present to a doctor with some mild but alarming chest pains, I'd hope they wouldn't just look at you and prescribe a course of leeches. A stethoscope is a good start (ICMP) but an ECG (dig) is better. That analogy might need some work 8)

indigodaddy 16 hours ago | parent | next [-]

If you have a demonstrated network/connectivity problem to an endpoint that provides DNS, then DNS is down (or at the very least degraded) for you. If a functionality of layer 3 is not working, should we expect layer 4 to work, and keep looking into aspects of layer 4 and/or layer 7, or would it make more sense to keep troubleshooting the layer 3 issue?? Any entry level NOC Technician would know at this point that doing digs/queries to the endpoint would not necessarily be meaningful when we have an underlying connectivity/network problem that is the likeliest main contributor to the issue.

gerdesj 15 hours ago | parent | next [-]

"Any entry level NOC Technician would know at this point"

I'm just a consultant who's been mucking about with networks for 30+ years. I'm sure your highly paid technicians will teach granddad a thing or two.

I note you switch between the OSI seven layer model and the ARPA four layer one with gay abandon. What are you doing at layers five and six?

We are all engineers here (whether chartered or not). The big question is - "Is the service up"? The service is DNS.

We go to the toolbox as any engineer does and use a tool for the job. I can hammer a screw into a wall or use a screwdriver - both will work but one will work effectively. I'll use dig but I imagine that a Windows jockey will use nslookup - both will work.

dig/nslookup fail? OK, now we look at connectivity issues - that's when ping comes in. However we do not own the DNS service and we cannot know that it is now dropping pings for some reason. Then we might play games with packet generators and Wireshark to try and determine what is going on. However, we do not run that failing service and all we can conclusively ... conclude is that for us, it is not working.

That's a far cry from Cloudflare DNS is down for everyone. We can only conclude that Cloudflare DNS is not working for me.

indigodaddy 15 hours ago | parent [-]

You seem to be not addressing my main point, which is, once we are confident we have a network/connectivity issue, what is the benefit of now focusing on the outcomes of DNS queries? How does that help us at this point, when we know that DNS is not working for us in large part due to not being able to reliably connect to the endpoint itself?

In regard to an endpoint out of our control, once we demonstrate we cannot connect to it or serious connectivity problems in general, "is the service (that the endpoint provides) up?" is not a question that we need or should be trying to answer at that point.

That's cool though, if you want, you can just keep doing digs to an endpoint that is degraded from a network perspective, while I keep trying to troubleshoot why we have packet loss to the endpoint..

vel0city 15 hours ago | parent | prev [-]

Plenty of hosts may respond to DNS while filtering ICMP. Showing a ping failure as an example of some authoritative layer 3 failure shows a misunderstanding of what ping is doing.

indigodaddy 15 hours ago | parent | next [-]

Sure, but here we are talking about an endpoint that we know should/previously responded to ICMP, and then are subsequently having a problem with it. So if we are now having a problem with the service provided by the endpoint, AND we see not insignificant packet loss on MTR/ping (or intermittent TTL exceeded which points to route issues), then we can be pretty certain we have a connectivity/network/route problem. Which is a problem at layer 3. My point in this whole thing is that once we know that, it makes no sense to say, oh let's shift to or we really should be "troubleshooting the service/application that the endpoint is providing" whether that be https or DNS or whatever. No, we keep troubleshooting the network/connectivity issues if/once we are confident that the problem lies therein.

vel0city 15 hours ago | parent [-]

> that we know should/previously responded to ICMP,

Is there any documentation or contract that says this shall always respond to ICMP traffic?

Isn't it possible ICMP is being filtered but not DNS?

Imagine if they had misconfigured their DNS, did a ping to 1.1.1.1, and decided 1.1.1.1 DNS is obviously down despite it only potentially being ICMP traffic.

Imagine someone having issues with a web server so they show their proof of the web server being down by showing it won't connect with SMTP traffic. This is the same concept with showing a ping.

indigodaddy 15 hours ago | parent [-]

Even if the dst host is blocking ICMP, there is still value and plenty to be learned from an MTR output, even enough to show a network/route issue.

15 hours ago | parent | prev [-]
[deleted]
esseph 15 hours ago | parent | prev [-]

Ping and MTR are actually several different tools spread across them.

Connectivity over ICMP / UDP / TCP, DNS resolution, Autonomous System path, MPLS circuit, IPv4 / IPv6 routing, circuit to endpoint latency, per hop firewall configuration, device packet security configuration, jitter, MTU, and probably some other things I'm forgetting.

A carpenter knows their tools.

gerdesj 15 hours ago | parent [-]

"A carpenter knows their tools."

Quite, and they also know when to use them effectively.

I have no idea what "Autonomous System path" is but it looks like someone searching terms. An Autonomous System is a BGP thing.

You say "I'm forgetting" and I say - you don't have much skin in this game.

I've spent roughly 30 years getting to grips with this stuff.

esseph 14 hours ago | parent [-]

I have helped designed some of the network hardware and software you may have used, I'm not sure how that's relevant. Pointless D measuring.

My point stands, which is: There are a lot of capabilities in these tools that should not be overlooked or dismissed.

In addition, reachability of the service is one of the things you would note with said tools as you work your way through the stack. You can even use MTR to see if the DNS server is holding port 53 open.