Remix.run Logo
toast0 5 hours ago

> But my students weren’t as happy as I was - they wanted to build something genuinely useful, and they were really disappointed that our “product” had strong architectural limits and couldn’t outperform titans like nginx and haproxy.

I took a (very brief) look at the github repo [1], it doesn't look like you're doing anything with cpu pinning.

You can probably eke (thanks) out a bit more performance if you cpu pin your threads and cpu pin your listen sockets (sockopt SO_INCOMING_CPU).

If you also cpu align your outgoing sockets, you should get a significant boost, but afaik, there's no great api for that. Linux does have an api for compatible NICs (traffic steering/flow steering) which can work, but if you know what hash your NIC uses (it's probably toeplitz) and you manage source port selection to your backend, you can pick ports that will hash properly.

The goal is for your proxy to be able to handle packets without any cross cpu communication.

[1] https://github.com/sibexico/TinyGate

Sibexico 4 hours ago | parent | next [-]

Basically, v0 and v1 of the repo is completely different implementations, written almost from scratch. Now working on the 3rd one implementation, I believe the last one. :) Completely different architectural choices was made.

toast0 an hour ago | parent [-]

If it's still running on more than a single core, and your students want it to go faster, aligning the work to cpus will almost certainly be useful.

I saw you mentioned windows development elsewhere. You might be interested to know that Microsoft pionered Receive Side Scaling and Send Side Scaling. If you try your proxy out on Windows, be sure to hook into those systems there.

The less work your proxy does, the more important avoiding cross core communication is.

camkego 32 minutes ago | parent [-]

Pin threads to cores, and make sure threads different cores aren’t writing to the same 64 or 128 byte block. Lookup “false sharing”

jibal 4 hours ago | parent | prev [-]

eke