I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.

▲

unglaublich 5 hours ago | parent | next [-]

Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.

	▲	4 hours ago \| parent [-]
		[deleted]

▲

sig_kill 2 hours ago | parent | prev | next [-]

You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.

▲

redox99 4 hours ago | parent | prev [-]

Yes, it should use actual output from some of the open models.