▲ | lustre-fan 2 days ago | |||||||
Yeah, Lustre supports EFA as a network transit between a Lustre client and a Lustre server. It's lnet/klnds/kefalnd/ in the Lustre tree. But Lustre doesn't support NVMeoF directly. It uses a custom protocol. And neither does EFA. Someone would have to modify the NVMeoF RDMA target/host drivers to support it. EFA already supports in-kernel IB clients (that's how Lustre uses EFA today). So it's not an impossible task. It's just that no one has done it. | ||||||||
▲ | foota 2 days ago | parent [-] | |||||||
Hey, thanks for the comment! Also, I'm amused by the specificity of your account haha, do you have something set to monitor HN for mentions of Lustre? > "But Lustre doesn't support NVMeoF directly. It uses a custom protocol." Could you link me to this? I searched the lustre repo for nvme and didn't see anything that looked promising, but would be curious to read how this works. > "And neither does EFA. Someone would have to modify the NVMeoF RDMA target/host drivers to support it." To confirm, you're saying there'd need to be something like an EFA equivalent to https://kernel.googlesource.com/pub/scm/linux/kernel/git/tor... (and corresponding initiator code)? > "EFA already supports in-kernel IB clients (that's how Lustre uses EFA today). So it's not an impossible task. It's just that no one has done it." I think you're saying there's already in-kernel code for interfacing with EFA, because this is how lnet uses EFA? Is that https://kernel.googlesource.com/pub/scm/linux/kernel/git/tyc...? I found this but I wasn't sure if this was actually the dataplane (for lack of a better word) part of things, from what I read it sounded like most of the dataplane was implemented in userspace as a part of libfabric, but it sounds like I might be wrong. Does this mean you can generally just pretend that EFA is a normal IB interface and have things work out? If that's the case, why doesn't NVME-of just support it naturally? Just trying to figure out how these things fit together, I appreciate your time! In case you're curious, I have a stateful service that has an NVME backed cache over object storage and I've been wondering what it would take to make it so that we could run some proxy services that can directly read from that cache to scale out the read throughput from an instance. | ||||||||
|