Remix.run Logo
GaggiX 2 hours ago

The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.