Remix.run Logo
cubefox 3 hours ago

Mamba-3 is an architecture while diffusion is, I believe, a type of objective. So these are not mutually exclusive and therefore not comparable.

gyrovagueGeist an hour ago | parent [-]

Not wrong, but I think it's more accurate to say:

Mamba is an architecture for the middle layers of the network (the trunk) which assumes decoding takes place through an autoregressive sequence (popping out tokens in order). This is the SSM they talk about.

Diffusion is an alternative to the autoregressive approach where decoding takes place through iterative refinement on a batch of tokens (instead of one at a time processing and locking each one in only looking forward). This can require different architectures for the trunk, the output heads, and modifications to the objective to make the whole thing trainable. Could mamba like ideas be useful in diffusion networks...maybe but it's a different problem setup.