▲ | ACCount37 5 days ago | ||||||||||||||||
I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense. | |||||||||||||||||
▲ | Jensson 5 days ago | parent [-] | ||||||||||||||||
That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all. The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails. | |||||||||||||||||
|