| ▲ | WatchDog 2 days ago | |||||||
Perhaps, the word does have it's own token, " geschniegelt"(geschniegelt with a space in front of it), is token 192786 in the tokenizer that GPT-5 apparently uses. https://raw.githubusercontent.com/niieani/gpt-tokenizer/refs... | ||||||||
| ▲ | nextaccountic a day ago | parent [-] | |||||||
Isn't giving this word a token something deeply wasteful? When some more common things are multiple tokens. Indeed, how do they deal with Chinese? Are some ideograms multiple tokens? | ||||||||
| ||||||||