| ▲ | quamserena 12 hours ago | |
Including RTL-LTR flips, character substitutions etc? I think Unicode is vast enough where it’s possible to evade any filter and still look textlike enough to the end user, and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI? | ||
| ▲ | zahlman an hour ago | parent | next [-] | |
I assume that anyone trying to "filter" the text could just render it and then OCR it. | ||
| ▲ | Sabinus 12 hours ago | parent | prev [-] | |
Ultimately the AI will just learn those tokens are basically the same thing. You'll just be reducing the learning rate by some (probably tiny) amount. | ||