How does it compare to popular local inference engines, e.g. ollama, lm studio, or handrolled llama.cpp? I saw a brief benchmark in the readme but wasn't sure if there was more.