Remix clone Hacker News

new | show | ask | jobs Github

	▲	verdverm 5 hours ago
		I have tried llama-cpp, vllm is nicer (ray, handles queueing, doesn't have the cache invalidation bug for qwen/gemma models) and unsloth has toxic employees in their discord. I've run 2 qwen/gemma @8bit with full context window side-by-side. Right now I have 4 models on my spark (qwen36moe, embedding, reranker, qwen3-1.7B) to support my markdown kb tool. The setup is not as capable, but still good and gets better with models/algos. To me, it's more about the freedom to tinker, freedom from token bill anxiety, and potential right to compute should the government/oligarchy decides it gets to decide who can access which models.
	▲	woadwarrior01 an hour ago \| parent [-]
		> unsloth has toxic employees in their discord Would you mind elaborating on this?