Remix clone Hacker News

new | show | ask | jobs Github

	▲	kame3d 4 hours ago
		Interesting! I just tried the quantized Q4_K_M from [1] in my RTX 2070 Super, it ran at 110 tok/s with 1800 tok/s prefill, and found the same solution to your prompt. It generated valid LaTeX for the answer but its reasoning trace uses mostly compact ASCII math notation. Took 3min 22s to answer, spending 22k tokens almost all on thinking. [1] https://huggingface.co/prithivMLmods/VibeThinker-3B-GGUF