as in this stuff happens at the tokenizer / internal representation layer? sorry can you help me understand why can't we sanitize it?