▲ | deepsun 4 days ago | |||||||||||||||||||
That's assuming the text is not corrupted or maliciously modified. There were (are) _numerous_ vulnerabilities due to parsing/escaping of invalid UTF-8 sequences. Quick googling (not all of them are on-topic tho): https://www.rapid7.com/blog/post/2025/02/13/cve-2025-1094-po... | ||||||||||||||||||||
▲ | restalis 3 days ago | parent | next [-] | |||||||||||||||||||
This tendency of requirement overloading, for what can otherwise be a simple solution for a simple problem, is the bane of engineering. In this case, if security is important, it can be addressed separately, e.g. for the underlying text treated as an abstract information block that has to be packaged with corresponding error codes then checked for integrity before consumption. The UTF-8 encoding/decoding process itself doesn't necessarily have to answer the security concerns. Please let the solutions be simple, whenever they can be. | ||||||||||||||||||||
▲ | s1mplicissimus 4 days ago | parent | prev [-] | |||||||||||||||||||
I was just wondering a similar thing: If 10 implies start of character, doesn't that require 10 to never occur inside the other bits of a character? | ||||||||||||||||||||
|