Remix.run Logo
cubefox 3 days ago

It would be interesting how this manifests in SSM/Mamba models. The way they handle their context window is different from Transformers, as the former don't use the Attention mechanism. Mamba is better at context window scaling than transformers but worse at explicit recall. Though that doesn't tell us how susceptible they are to context distractions.