openonload is faster than the kernel even with the most basic configuration, which is pretty much drop-in and requires zero changes on your application.