Remix.run Logo
ks2048 4 days ago

It's worth noting that Unicode already defines a "General Category" for all code points that categorizes some of these types of "weird" characters.

https://en.wikipedia.org/wiki/Unicode_character_property#Gen...

e.g. in Python,

   import unicodedata
   print(unicodedata.category(chr(0)))
   print(unicodedata.category(chr(0xdead)))
Shows "Cc" (control) and "Cs" (surrogate).