| ▲ | tialaramex a day ago | |||||||
I don't think Rust gets this horribly wrong. &str is some bytes which we've agreed are UTF-8 encoded text. So, it's not a sequence of graphemes, though it does promise that it could be interpreted that way, and it is a sequence of bytes but not just any bytes. In Rust "AbcdeF"[1] isn't a thing, it won't compile, but "AbcdeF"[1..=1] says we want the UTF-8 substring starting from byte 1 through to byte 1 and that compiles, and it'll work because that string does have a valid UTF-8 substring there, it's "b" -- However it'll panic if we try to "€300"[1..=1] because that's no longer a valid UTF-8 substring, that's nonsense. For app dev this is too low level, but it's nice to have a string abstraction that's at home on a small embedded device where it doesn't matter that I can interpret flags, or an emoji with appropriate skin tones, or whatever else as a distinct single grapheme in Unicode, but we would like to do a bit better than "Only ASCII works in this device" in 2025. | ||||||||
| ▲ | Someone a day ago | parent | next [-] | |||||||
> I don't think Rust gets this horribly wrong > In Rust "AbcdeF"[1] isn't a thing, it won't compile, but "AbcdeF"[1..=1] says we want the UTF-8 substring starting from byte 1 through to byte 1 and that compiles, and it'll work because that string does have a valid UTF-8 substring there, it's "b" -- However it'll panic if we try to "€300"[1..=1] I disagree. IMO, an API that uses byte offsets to substring on Unicode code points (or even larger units?) already is a bad idea, but then, having it panic when the byte offsets do not happen to be code point/(extended) grapheme cluster boundaries? How are you supposed to use that when, as you say ”we would like to do a bit better than "Only ASCII works in this device" in 2025”? I see there’s a better API that doesn’t throw (https://doc.rust-lang.org/std/primitive.str.html#method.get), but that IMO, still isn’t as nice as Swift’s choice because it still uses byte offsets | ||||||||
| ||||||||
| ▲ | zzo38computer a day ago | parent | prev [-] | |||||||
You can do better than "only ASCII works in this device", and making the default string type to be Unicode is the wrong way to do that. For some applications, you might not need to interpret text at all, or you might need to only interpret ASCII text even if the text is not necessarily purely ASCII; other times you will want to do other things, but Unicode is not a very good character set (there are others but what is appropriate will depend much on the specific application in use; sometimes none are appropriate), and even if you are using Unicode you still don't need a Unicode string type, and you do not need it to check for valid UTF-8 for every string operation by default, because that will result in inefficiency. | ||||||||
| ||||||||