Remix clone Hacker News

new | show | ask | jobs Github

	▲	LoganDark 8 hours ago
		For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference).