We have a whitelist of allowed characters, which is a pretty big list.
I think we based it on Lodash’ deburr source code. If deburr’s output is a-z and some common symbols, it passes (and we store the original value)
https://www.geeksforgeeks.org/lodash-_-deburr-method/