| ▲ | 2001zhaozhao 6 hours ago | |||||||
This is 128B dense though. the K/V cache on long context is going to be massive | ||||||||
| ▲ | Havoc 5 hours ago | parent | next [-] | |||||||
Don’t think kv size correlates to dense/moe | ||||||||
| ||||||||
| ▲ | syntaxing 4 hours ago | parent | prev [-] | |||||||
With turbo quant, you would reduce it by over 6X. | ||||||||