| ▲ | SXX 5 hours ago | |||||||
I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more. | ||||||||
| ▲ | unglaublich 5 hours ago | parent | next [-] | |||||||
Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience. | ||||||||
| ||||||||
| ▲ | sig_kill 2 hours ago | parent | prev | next [-] | |||||||
You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions. | ||||||||
| ▲ | redox99 4 hours ago | parent | prev [-] | |||||||
Yes, it should use actual output from some of the open models. | ||||||||