| ▲ | vb-8448 9 hours ago | |
Actually even with a 9k hardware you won't get good enough performance. There is an interesting video from antirez on trying to run deepseek v4 flash 2bits on a m3 max 128GB ... and the result is kind delusional: as soon as the context start growing you are around 20token/s. | ||
| ▲ | 8 hours ago | parent | next [-] | |
| [deleted] | ||
| ▲ | zozbot234 9 hours ago | parent | prev [-] | |
Prefill performance used to be the real bottleneck on antirez's DS4 and that's been greatly improved by now, it doesn't perceivably slow down with growing context. | ||