Remix.run Logo
anonymoushn 2 hours ago

their old tokenizer performed some space collapsing that allowed them to use the same token id for a word with and without the leading space (in cases where the context usually implies a space and one is not present, a "no space" symbol is used).