Correct, most of r/LocalLlama moved onto next gen MoE models mostly. Deepseek introduced few good optimizations that every new model seems to use now too. Llama 4 was generally seen as a fiasco and Meta haven't made a release since

▲

fragmede 2 hours ago | parent [-]

What are some of the models people are using? (Rather than naming the ones they aren't.)

	▲	eurekin an hour ago \| parent [-]
		GLM 4.7 is new and promising. MinMax 2.1 is good for agents. Of course the qwen3 family, vl versions are spectacular. NVIDIA Nemotron Nano 3 excels at long context and the unsloth variant has been extended to 1m tokens. I thought the last one was a toy, until I tried with a full 1.2 megabyte repomix project dump. It actually works quite well for general code comprehension across the whole codebase, CI scripts included. Gpt-oss-120 is good too, altough I'm yet to try it out for coding specifically