| ▲ | formerly_proven 20 hours ago |
| strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array. A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en |
|
| ▲ | Cyph0n 20 hours ago | parent | next [-] |
| strncpy doesn’t handle overlapping buffers (undefined behavior). Better to use strncpy_s (if you can) as it is safer overall. See: https://en.cppreference.com/w/c/string/byte/strncpy.html. As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs. |
| |
| ▲ | formerly_proven 17 hours ago | parent | next [-] | | > strncpy doesn’t handle overlapping buffers (undefined behavior). It would make little sense for strncpy to handle this case, since, as I pointed out above, it converts between different kinds of strings. | |
| ▲ | Asooka 19 hours ago | parent | prev [-] | | Back when strncpy was written there was no undefined behaviour (as the compiler interprets it today). The result would depend on the implementation and might differ between invocations, but it was never the "this will not happen" footgun of today. The modern interpretation of undefined behaviour in C is a big blemish on the otherwise excellent standards committee, committed (hah) in the name of extremely dubious performance claims. If "undefined" meaning "left to the implementation" was good enough when CPU frequency was measured in MHz and nobody had more than one, surely it is good enough today too. Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level. | | |
| ▲ | Cyph0n 18 hours ago | parent | next [-] | | The problem isn't undefined behavior per se; I was using it as an example for strncpy. Rust is a no - in fact, the goal of (safe) Rust is to eliminate undefined behavior. Zig on the other hand I don't know about. In general, I see two issues at play here: 1. C relies heavily on unsized pointers (vs. fat pointers), which is why strncpy_s had to "break" strncpy in order to improve bounds checks. 2. strncpy memory aliasing restrictions are not encoded in the API and can only be conveyed through docs. This is a footgun. For (1), Rust APIs of this type operate on sized slices, or in the case of strings, string slices. Zig defines strings as sized byte slices. For (2), Rust enforces this invariant via the borrow checker by disallowing (at compile-time) a shared slice reference that points to an overlapping mutable slice reference. In other words, an API like this is simply not possible to define in (safe) Rust, which means you (as the user) do not need to pore over the docs for each stdlib function you use looking for memory-related footguns. | | |
| ▲ | loeg 10 hours ago | parent [-] | | > For (2), Rust enforces this invariant via the borrow checker by disallowing (at compile-time) a shared slice reference that points to an overlapping mutable slice reference. At least the last time I cared about this, the borrow checker wouldn't allow mutable and immutable borrows from the same underlying object, even if they did not overlap. (Which is more restrictive, in an obnoxious way.) | | |
| ▲ | Cyph0n 9 hours ago | parent [-] | | Do you mean borrows for different fields of a struct? If so, that’s handled today - it’s sometimes called “splitting borrows”: https://doc.rust-lang.org/nomicon/borrow-splitting.html | | |
| ▲ | loeg 9 hours ago | parent [-] | | Not exactly -- independent subranges of the same range (as would be relevant to something like memcpy/memmove/strcpy). E.g., https://godbolt.org/z/YhGajnhEG It's mentioned later in the same article you shared above. | | |
| ▲ | oneshtein 9 minutes ago | parent | next [-] | | fn f() {
let mut v = vec![1, 2, 3, 4, 5];
let (header, tail) = v.split_at_mut(1);
b(&header[0], &mut tail[0]);
}
| |
| ▲ | Cyph0n 9 hours ago | parent | prev [-] | | Gotcha. There is a split_at_mut method that splits a mutable slice reference into two. That doesn’t address the problem you had, but I think that’s best you can do with safe Rust. | | |
| ▲ | loeg 8 hours ago | parent [-] | | Yeah. It just isn't something the borrow checker natively understands. |
|
|
|
|
| |
| ▲ | tialaramex 10 hours ago | parent | prev [-] | | Rust safe subset doesn't have UB. At all. So long as you never write the "unsafe" keyword you're fine, the compiler will check you are obeying all of the language rules at all times. Whereas in C, oops, sorry, you broke a rule you didn't even know existed and so that's Undefined Behaviour left and right. Some of it you could argue falls into the category you're describing, where in a better world it should have been made Implementation Defined, not UB, and too bad. However lots of it is just because the language was designed a very long time ago and prioritized ease of implementation. If you wish the language was properly defined, you should use (safe) Rust. If you just wish that when you write nonsense the compiler should somehow guess what you meant and do that, you're not actually a programmer, find a practice which suits you better - take up knitting, learn to paint, something like that. |
|
|
|
| ▲ | dundarious 19 hours ago | parent | prev | next [-] |
| Yes, these were also common in several wire formats I had to use for market data/entry. You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (restarts intra-day or shortly before the open were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison. |
|
| ▲ | ufo 20 hours ago | parent | prev | next [-] |
| A big footgun with strncpy is that the output string may not be null terminated. |
| |
| ▲ | kccqzy 20 hours ago | parent [-] | | Yeah but fixed width strings don’t need null termination. You know exactly how long the string is. No need to find that null byte. | | |
| ▲ | ninkendo 20 hours ago | parent | next [-] | | Until you pass them as a `char *` by accident and it eventually makes its way to some code that does expect null termination. There’s languages where you can be quite confident your string will never need null termination… but C is not one of them. | | |
| ▲ | kccqzy 19 hours ago | parent | next [-] | | You don’t do that by accident. Fixed-width strings are thoroughly outdated and unusual. Your mental model of them is very different from regular C strings. | | |
| ▲ | arka2147483647 17 hours ago | parent | next [-] | | Sadly, all the bug trackers are full of bugs relating to char*. So you very much do those by accident. And in C, fixed width strings are not in any way rare or unusual. Go to any c codebase you will find stuff like: char buf[12];
sprintf(buf, "%s%s", this, that); // or
strcat(buf, ...) // or
strncpy(buf, ...) // and so on..
| | |
| ▲ | snickerbockers 15 hours ago | parent | next [-] | | Thats only really a problem if this and that are coming from an external source and have not been truncated. I really don't see this as any more significant of a problem than all the many high level scripting languages where you can potentially inject code into a variable and interpret it. There are certainly ways in which the c library could've been better (eg making strncpy handle the case where the source string is longer than n) but ultimately it will always need to operate under the assumption that the people using it are both competent and acting in good faith. | |
| ▲ | kccqzy 14 hours ago | parent | prev [-] | | When you write such code your mental model is C strings, not fixed-width strings, the intended use case for strncpy. |
| |
| ▲ | ninkendo 17 hours ago | parent | prev [-] | | The mental model doesn’t matter, it’s the compiler’s model that is going to bite you. If the compiler doesn’t reject it, it will happen eventually. |
| |
| ▲ | 19 hours ago | parent | prev [-] | | [deleted] |
| |
| ▲ | Sharlin 20 hours ago | parent | prev [-] | | Good luck though remembering not to pass one to any function that does expect to find a null terminator. | | |
| ▲ | kevin_thibedeau 19 hours ago | parent | next [-] | | Ignore the prefix and always treat strncpy() as a special binary data operation for an era where shaving bytes on storage was important. It's for copying into a struct with array fields or direct to an encoded block of memory. In that context you will never be dependent on the presence of NUL. The only safe usage with strings is to check for NUL on every use or wrap it. At that point you may as well switch to a new function with better semantics. | | |
| ▲ | masklinn 3 hours ago | parent [-] | | > an era where shaving bytes on storage was important Fixed size strings don’t save bytes on storage tho, when the bank reserves 20 bytes for first name and you’re called Jon that’s 17 bytes doing fuckall. What they do is make the entire record fixed size and give every field a fixed relative position so it’s very easy to access items, move record around, reuse allocations (or use static allocation), … cycles is what they save. |
| |
| ▲ | integralid 18 hours ago | parent | prev | next [-] | | That's not a problem with strncpy, right? Fixed width records are a thing of the past, and even then it was only used for on-disk storage. | |
| ▲ | andrepd 19 hours ago | parent | prev [-] | | Seriously. We have type systems and compilers that help us to not forget these things. It's not the 70s anymore! | | |
|
|
|
|
| ▲ | dingi 19 hours ago | parent | prev [-] |
| Isn't strlcpy the safer solution these days? |
| |
| ▲ | jandrese 17 hours ago | parent [-] | | I don't think anybody in this thread read the article. Strlcpy tries to improve the situation but still has problems. As the article points out it is almost never desirable to truncate a string passed into strXcpy, yet that is what all of those functions do. Even worse, they attempt to run to the end of the string regardless of the size parameter so they don't even necessarily save you from the unterminated string case. They also do loads of unnecessary work, especially if your source string is very long (like a mmaped text file). Strncpy got this behavior because it was trying to implement the dubious truncation feature and needed to tell the programmer where their data was truncated. Strlcpy adopted the same behavior because it was trying to be a drop in replacement. But it was a dumb idea from the start and it causes a lot of pain unnecessarily. The crazy thing is that strcpy has the best interface, but of course it's only useful in cases where you have externally verified that the copy is safe before you call it, and as the article points out if you know this then you can just use memcpy instead. As you ponder the situation you inevitably come to the conclusion that it would have been better if strings brought along their own length parameter instead of relying on a terminator, but then you realize that in order to support editing of the string as well as passing substrings you'll need to have some struct that has the base pointer, length, and possibly a substring offset and length and you've just re-invented slices. It's also clear why a system like this was not invented for the original C that was developed on PDP machines with just a few hundred KB of RAM. Is it really too late for the C committee to not develop a modern string library that ships with base C26 or C27? I get that they really hate adding features, but C strings have been a problem for over 50 years now, and I'm not advocating for the old strings to be removed or even deprecated at this time. Just that a modern replacement be available and to encourage people to use them for new code. | | |
| ▲ | cyberpunk 16 hours ago | parent [-] | | Do they really need to at this point? Just include bstrlib and stop thinking about it? | | |
| ▲ | jandrese 16 hours ago | parent [-] | | Having an official replacement is the only thing that I think will motivate the majority C programmers to finally switch. |
|
|
|