Remix.run Logo
77pt77 14 hours ago

> If you treat your strings as opaque blobs, and use UTF8, most of internationalization problems go away

This is laughably naive.

So many things can go wrong.

Strings are not arrays of bytes.

There is a price to pay if someone doesn't understand that or chooses to ignore it.

shakna 9 hours ago | parent | next [-]

> Strings are not arrays of bytes.

That very much depends on the language that you are using. In some, they are.

hughesjj 14 hours ago | parent | prev | next [-]

RTL go brrr

rpigab 7 hours ago | parent [-]

RTL is so much fun, it's the gift that keeps on going, when I first encountered it I thought, ok, maybe some junior web app developers will sometimes forget that it exists and a fun bug or two will get into production, but it's everywhere, Windows, GNU/Linux, automated emails, it can make malware hardware to detect by users in Windows because you can hide the dotexe at the beginning of the filename, etc.

Here it is today in GNOME 46.0, after so many years, this should say "selected": https://github.com/user-attachments/assets/306737fb-6b01-467... In previous GNOME versions it would mess up even more text in the file properties window.

Here's an article about it, but I couldn't find the more interesting blogpost about RTL: https://krebsonsecurity.com/2011/09/right-to-left-override-a...

lelandbatey 10 hours ago | parent | prev [-]

And yet when stored on any computer system, that string will be encoded using some number of bytes. Which you can set a limit on even though you cannot cut, delimit, or make any other inference about that string from the bytes without doing some kind of interpretation. But the bytes limit is enough for the situation the OP is talking about.