Remix clone Hacker News

new | show | ask | jobs Github

	▲	robotswantdata 2 hours ago
		Feels 100% vibe coded in a bad way. Llama.cpp already has KV compression and one of the turbo quant PRs will get merged at some point. If you don’t care about the fancy 3 bit, the q8 KV compression is good enough! Don’t bother with q4 ./build/bin/llama-server -m model.gguf \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ -c 65536 Etc
	▲	aegis_camera an hour ago \| parent [-]
		One of my user requested MLX comparison with GGUF, he wanted to run the benchmark, I was thinking about how to get MLX support without bundling the python code together with SharpAI Aegis, a Local or BYOK local security agent https://www.sharpai.org. Then I had to pick up the Swift and create it. The benchmark shows a benefit of MLX engine, so it's user's choice which engine to use, aegis-ai supports both : )