Remix.run Logo
__turbobrew__ 3 days ago

> This system will allot network resources on a per-customer basis, creating a budget that, once exceeded, will prevent a customer's traffic from degrading the service for anyone else on the platform

How would this work practically? If a single client is overflowing the edge router queues you are kindof screwed already? Even if you dropped all packets from that client you would need to still process the packets to figure out what client they belong to before dropping the packets?

I guess you could somehow do some shuffle sharding where a single client belongs to a few IP prefixes and when that client misbehaves you withdraw those prefixes using BGP to essentially black hole the network routes for that client. If the shuffle sharding is done right only the problem client will have issues as other clients on the same prefixes will be sharded to other prefixes.

milofeynman 3 days ago | parent | next [-]

It's load shedding, but it's weighted towards people abusing their quota usually over some rolling weighted average. The benefit is that they are dropped immediately at the edge rather than holding sockets open or using compute/resources. It usually takes 30s-1m to kick in.

Thorrez 3 days ago | parent | prev | next [-]

In this specific case, it wasn't requests from the client that caused overload. It was the responses to those requests. So Cloudflare can avoid sending responses, and prevent the problem.

You're right that this doesn't solve all cases, but it would have prevented this case.

jcalvinowens 3 days ago | parent | prev | next [-]

> Even if you dropped all packets from that client you would need to still process the packets to figure out what client they belong to before dropping the packets?

In modern Linux you can write BPF-XDP programs to drop traffic at the lowest level in the driver before any computation is spent on them at all. Nearly the first thing the driver does after getting new packets in the rx ring buffer is run your program on them.

__turbobrew__ 2 days ago | parent [-]

Say you have a BPF-XDP program which processes the packet to figure out what client the packet is coming from and selectively drops those packets. Is that really going to be faster than just forwarding the packet from the edge router to the next hop? I find it hard to believe that running such a program would actually alleviate full queues when all the edge router is doing is just forwarding to the next hop?

jeffbee 3 days ago | parent | prev | next [-]

Perhaps they drop the client's flows on the host side.

__turbobrew__ 3 days ago | parent [-]

I don’t understand? The issue is that a client/customer outside of cloudflares control DOSed one of their network links. Cloudflare has no control on the client side to implement rate limiting?

fusl 3 days ago | parent [-]

I think you misunderstand the flow of traffic here. The data flow, initiated by requests coming from AWS us-east-1, was Cloudflare towards AWS, not the other way around. Cloudflare can easily control where and how their egress traffic gets to the destination (as long as there are multiple paths towards the target) as well as rate limit that traffic to sane levels.

__turbobrew__ 3 days ago | parent [-]

Ah I see now. Yes in that case they could just reply with 429 codes or just not reply at all.

everfrustrated 3 days ago | parent | prev [-]

I think you're overthinking this. Just having a per (cloudflare) customer rate limit would go a long long way.