| ▲ | robotswantdata 2 hours ago | |
Feels 100% vibe coded in a bad way. Llama.cpp already has KV compression and one of the turbo quant PRs will get merged at some point. If you don’t care about the fancy 3 bit, the q8 KV compression is good enough! Don’t bother with q4 ./build/bin/llama-server -m model.gguf \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ -c 65536 Etc | ||
| ▲ | aegis_camera an hour ago | parent [-] | |
One of my user requested MLX comparison with GGUF, he wanted to run the benchmark, I was thinking about how to get MLX support without bundling the python code together with SharpAI Aegis, a Local or BYOK local security agent https://www.sharpai.org. Then I had to pick up the Swift and create it. The benchmark shows a benefit of MLX engine, so it's user's choice which engine to use, aegis-ai supports both : ) | ||