Remix.run Logo
ACCount37 5 days ago

I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.

Jensson 5 days ago | parent [-]

That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.

pfg_ 4 days ago | parent [-]

I tried this four times, every time it recognized it as nonsense.

typpilol 4 days ago | parent [-]

Same