| ▲ | ls612 an hour ago | |
10 tok/s is around the borderline of interactive being good. I did the math and it is mostly bottlenecked by memory bandwidth, so in the future I can expect to run a similarly sized model on my 4090 once it gets retired from gaming service and get ~25 tok/s which will be very usable. | ||