| ▲ | Someone a day ago | |
> I don't think Rust gets this horribly wrong > In Rust "AbcdeF"[1] isn't a thing, it won't compile, but "AbcdeF"[1..=1] says we want the UTF-8 substring starting from byte 1 through to byte 1 and that compiles, and it'll work because that string does have a valid UTF-8 substring there, it's "b" -- However it'll panic if we try to "€300"[1..=1] I disagree. IMO, an API that uses byte offsets to substring on Unicode code points (or even larger units?) already is a bad idea, but then, having it panic when the byte offsets do not happen to be code point/(extended) grapheme cluster boundaries? How are you supposed to use that when, as you say ”we would like to do a bit better than "Only ASCII works in this device" in 2025”? I see there’s a better API that doesn’t throw (https://doc.rust-lang.org/std/primitive.str.html#method.get), but that IMO, still isn’t as nice as Swift’s choice because it still uses byte offsets | ||
| ▲ | tialaramex a day ago | parent [-] | |
> How are you supposed to use that [...]? It's often the case that we know where a substring we want starts and ends, so this operation makes sense - because we know there's a valid substring this won't panic. For example if we know there's a literal colon at bytes 17 and 39 in our string foo, foo[18..39] is the UTF-8 text from bytes 18 to 38 inclusive, representing the string between those colons. One source of confusion here, is not realising that UTF-8 is a self-synchronising encoding. There are a lot of tricks that are correct and fast with UTF-8 but would be a disaster in the other multi-byte encodings or if (which is never the case in Rust) this isn't actually a UTF-8 string. | ||