| ▲ | NitpickLawyer 6 hours ago | |||||||
It really depends. With the new "thinking" models they usually spend some time before writing the final answer. If they "think" for 1k tokens, that's a minute of spinning wheel you're gonna see for each question. Add that to the prompt processing, and diminishing speeds as context increases, and it becomes really slow for longer sessions. | ||||||||
| ▲ | mudkipdev 5 hours ago | parent [-] | |||||||
Reminds me of the possibility of running DeepSeek at 3-4 t/s with SSD streaming, could be viable if you are running something overnight for example | ||||||||
| ||||||||