| ▲ | LoganDark 8 hours ago | |
For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference). | ||