| ▲ | ronsor 12 hours ago | |||||||||||||
> text obfuscation against LLM scrapers Nice! But we already filter this stuff before pretraining. | ||||||||||||||
| ▲ | quamserena 12 hours ago | parent [-] | |||||||||||||
Including RTL-LTR flips, character substitutions etc? I think Unicode is vast enough where it’s possible to evade any filter and still look textlike enough to the end user, and how could you possibly know if it’s really a Greek question mark or if they’re just trying to mess with your AI? | ||||||||||||||
| ||||||||||||||