Remix.run Logo
satvikpendem an hour ago

Ollama is not recommended [0], use llama.cpp or more specifically Unsloth Studio which wraps llama.cpp and which has an API mode you can use to hook into Hermes or another agent. Unsloth make both the Studio and the quants which fix various issues with many models [1] as well as implementing new features like MTP and QAT support much sooner than other teams. In general you should read r/LocalLLaMa as it has a lot of updates regarding local models as the field moves fast.

[0] https://sleepingrobots.com/dreams/stop-using-ollama/

[1] https://github.com/unslothai/unsloth/discussions/4921