Remix.run Logo
cs702 2 days ago

A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao