Remix clone Hacker News

new | show | ask | jobs Github

	▲	dragonwriter 6 days ago
		> My RTX 5090 is about 10x faster (measured by FP32 TFLOPS) and I still don't find it to be fast enough. I can't imagine using something so slow for AI/ML. Only 2.2 tokens/sec on an 8B parameter Llama model? That's slower than someone typing. Its also orders of magnitudr slower than what I normally see cited by people using 5090s; heck, its even much slower than I see on my own 3080Ti laptop card for 8B models, though usually won’t use more than an 8bpw quant for that size model.
	▲	Sohcahtoa82 6 days ago \| parent [-]
		Yeah, I must be doing something wrong. Someone else pointed out that I should be getting much better performance. I'll be looking into it.