Remix clone Hacker News

new | show | ask | jobs Github

	▲	eurekin 3 hours ago
		GLM 4.7 is new and promising. MinMax 2.1 is good for agents. Of course the qwen3 family, vl versions are spectacular. NVIDIA Nemotron Nano 3 excels at long context and the unsloth variant has been extended to 1m tokens. I thought the last one was a toy, until I tried with a full 1.2 megabyte repomix project dump. It actually works quite well for general code comprehension across the whole codebase, CI scripts included. Gpt-oss-120 is good too, altough I'm yet to try it out for coding specifically
	▲	magicalhippo an hour ago \| parent \| next [-]
		Since I'm just a pleb with a 5090, I run GPT-OSS 20B a lot, since it fits comfortably in VRAM with max context size. I find it quite decent for a lot of things, especially after I set reasoning effort to high and disabled top-k and top-p and set min-p to something like 0.05. For the Qwen3-VL, I recently read that someone got significantly better results by using F16 or even F32 versions of the vision model part, while using a Q4 or similar for the text model part. In llama.cpp you can specify these separately[1]. Since the vision model part is usually quite small in comparison, this isn't as rough as it sounds. Haven't had a chance to test that yet though. [1]: https://github.com/ggml-org/llama.cpp/blob/master/tools/serv... (using --mmproj AFAIK)
	▲	nightski 2 hours ago \| parent \| prev [-]
		Does GLM 4.7 run well on the spark? I thought I read it didn’t but it wasn’t clear.