▲ | develatio 4 days ago | ||||||||||||||||||||||||||||||||||
I was not able to understand why these code points are bad. The post states that they are bad, but why? Any examples? Any actual situations and PoC that might help me understand how will that break "my code"? | |||||||||||||||||||||||||||||||||||
▲ | orangeboats 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Sometimes it's not just "your code". Strings are often interchanged and sent to many other parties. And some of the codepoints, such as the surrogate codepoints (which MUST come in pairs in properly encoded UTF-16), may not break your code but break poorly-written spaghetti-ridden UTF-16-based hellholes that do not expect unpaired surrogates. Something like: 1. You send a UTF-8 string containing normal characters and an unpaired surrogate: "Hello /uDEADworld" to FooApp. 2. FooApp converts the UTF-8 string to UTF-16 and saves it in a file. All without validation, so no crashes will actually occur; worst case scenario, the unpaired surrogate is rendered by the frontend as "�". 3. Next time, when it reads the file again, this time it is expecting normal UTF-16, and it crashes because of the unpaired surrogate. (A more fatal failure mode of (3) is out-of-bounds memory read if the unpaired surrogate happens at the end of string) | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | JimDabell 4 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Suppose, when you were registering your username `develatio`, you decided to put U+202E RIGHT-TO-LEFT OVERRIDE in there as well. Now when somebody is reading this page and their browser gets to your username, it switches the text direction to render it right-to-left. | |||||||||||||||||||||||||||||||||||
|