Remix.run Logo
GrinningFool 3 hours ago

That's a huge gap for llama.cpp server - any idea why?

zambelli 3 hours ago | parent [-]

Best guess is it's native mode. The function calling template is just broken for Nemo.

I did go with an extreme example in the post (but true). Other deltas are smaller but still statistically significant. 30 pt swing between llamserver prompt vs ollama, 4-5pt swing between llamafile and llamaserver prompt.