Remix.run Logo
WithinReason 6 days ago

Great work releasing such a small model! I would like to know your thoughts on using 2/3 of the model's size for embeddings. What would be different if you used a byte-level vocabulary and spent the parameter budget on transformer parameters instead?