| ▲ | Aurornis 2 hours ago | |
Unfortunately not with a reasonable context length. | ||
| ▲ | kkzz99 2 hours ago | parent | next [-] | |
It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090. | ||
| ▲ | GaggiX an hour ago | parent | prev [-] | |
The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision. | ||