| ▲ | kevin_thibedeau 14 hours ago |
| Systems programmers love to hate on unsigned integers. Generations have been infected with the Java world model that integers have to be pretend number lines centered on zero. Guess what, you still have boundary conditions to deal with. There are times when you really really need to use the full word range without negative values. This happens more often with low level programming and machines with small word sizes, something fewer people are engaged in. It doesn't need to be the default. Ada has them sequestered as modular types but it's available to use when needed. |
|
| ▲ | pjmlp 12 hours ago | parent | next [-] |
| Java doesn't have unsigned as primitive types, because James Gosling did a series of interviews at Sun among "expert" C devs, and all got the C language rules for unsigned arithmetic wrong. Yes I miss them in Java as primitives, however there are utility methods for unsigned arithmetic, that get it right. |
| |
| ▲ | layer8 11 hours ago | parent [-] | | Java has char as an unsigned 16-bit integer type. They should have made byte unsigned as well. | | |
| ▲ | pjmlp 3 hours ago | parent [-] | | Usually you don't do arithmetic with char in Java, this isn't C culture of anything goes. |
|
|
|
| ▲ | uecker 14 hours ago | parent | prev | next [-] |
| Having them available is not the issue, using them for sizes and indices is what causes a lot of tricky bugs. |
| |
| ▲ | jltsiren 11 hours ago | parent | next [-] | | I find it the opposite. Unsigned integers are intuitive, while signed integers are unintuitive and cause a lot of tricky bugs. Especially in languages, where signed overflow is undefined behavior. It's pretty rare to have values that can be negative but are always integers. At least in the work I do. The most common case I encounter are approximations of something related to log probability. Such as various scores in dynamic programming and graph algorithms. Most of the time, when you deal with integers, you need special handling to avoid negative values. Once you get used to thinking about unsigned integers, you quickly develop robust ways of avoiding situations where the values would be negative. | | |
| ▲ | uecker 3 hours ago | parent [-] | | It is interesting that you find unsigned integers more intuitive. My experience (also with students, but also analysis of CVE give plenty of evidence) is that the opposite is true: signed integers in C are a model of integers which have a nice mathematical structure which people learn in elementary school. Yes, this breaks down on overflow, but for this you have to reach very high numbers and there is very good tooling to debug this. In contrast, unsigned integers in C are modulo arithmetic which people learn at university, if at all, and get wrong all the time, and the errors are mostly subtle and very difficult to find automatically. You are right that often you need to constrain an integer to be non-negative or positive, but usually not during arithmetic, but at certain points in the logic of a program. But then in my experience it is better expressed as some assertion. |
| |
| ▲ | throwaway894345 13 hours ago | parent | prev [-] | | Why does an unsigned type for sizes or indices fare worse than a signed type? When do I want the -247th element in an array? When do I have a block that is -10 bytes in size? | | |
| ▲ | charlie90 11 hours ago | parent | next [-] | | Because doing subtraction on sizes/indicies is common, and signed handles the case where you subtract below 0. Unsigned yields unintuitive results. i.e, unsigned fails silently. For example, looping to the 2nd to last item in an array or getting the index before the given index. The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently. If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime. | | |
| ▲ | throwaway894345 6 hours ago | parent [-] | | > The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently. C’s implicit casts are tripping you up. Unsigned ints can’t be negative, but C will happily let you assign a negative signed int to an unsigned int variable, but the moment it is assigned it ceases to be negative. In serious programming languages this implicit assignment is forbidden—you have to explicitly cast. > For example, looping to the 2nd to last item in an array or getting the index before the given index. I don’t understand what you mean here, can you clarify? > If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime. Unsigned integers are the compile time side of the coin, but yes you may want to take care to enforce it at runtime as well, though this typically implies a performance penalty that most don’t want to pay. | | |
| ▲ | uecker 2 hours ago | parent [-] | | In C your compiler can help you with conversions and if not, please use a better one. In this regard, C is a very pragmatic language, and hence for actual work it is a more "serious" programming language than programming languages which are based on some idealistic theory that pedantic typing will fix all your problems, but actually keep you from doing your job. |
|
| |
| ▲ | uecker 13 hours ago | parent | prev | next [-] | | the reason is not that you want a negative index or size, but that you want the computation of the index to be correct, and you want to have obvious errors. Both turns out to be easier with signed types. | |
| ▲ | kevin_thibedeau 13 hours ago | parent | prev | next [-] | | There are (rare) times when you want negative array indices. C lets you index in both directions from a pointer to the middle of an array. That's why array indexing is signed in C. Some libc ctypes lookup tables do this. For sizing there is no strong case for negatives other than to shoehorn them into signed operations. | | |
| ▲ | throwaway894345 13 hours ago | parent [-] | | That’s interesting but seems pretty dangerous. How do you know you aren’t going to decrement off the front of the array? Keeping the pointer to the first element in the array and using offsets seems safer for humans and I don’t think the computer would care. | | |
| ▲ | mmilunic 12 hours ago | parent | next [-] | | Kinda a smart alec response, but how do you know you aren’t going to increment off the end of the array when operating normally? I guess it is twice the danger. | |
| ▲ | 8note 12 hours ago | parent | prev [-] | | i dont want an unsigned int either though. how do you know your arbitrary sized number is inside the size of the array? best off having a bespoke type that understands how big the array its indexing is |
|
| |
| ▲ | wavemode 12 hours ago | parent | prev [-] | | > When do I want the -247th element in an array? You never want any element of an array, except elements within the range [0, array_length). Anything outside of that is undefined behavior. I think people tend to overthink this. A function which takes an index argument, should simply return a result when the index is within the valid range, and error if it's outside of it (regardless of whether it's outside by being too low or too high). It doesn't particularly matter that the integer is signed. If you aren't storing 2^64 elements in your array (which you probably aren't - most systems don't even support addressing that much memory) then the only thing unsigned gets you is a bunch of footguns (like those described in the OP article). |
|
|
|
| ▲ | pron 13 hours ago | parent | prev | next [-] |
| In Java, unsigned arithmetic is available through an API and, as you said, it is pretty much only needed when marshalling to certain wire protocols or for FFI. Built-in unsigned types are useful primarily for bitfields or similar tiny types with up to 6 bits or so. |
| |
| ▲ | pjmlp 12 hours ago | parent [-] | | I miss them for doing bit juggling like file headers or networking packets. However I do concede writing a few helper methods isn't that much of a burden. | | |
| ▲ | pron 12 hours ago | parent [-] | | I think all the unsigned arithmetic you need is already offered. Unsigned shift right is an operator; the primitive wrappers offer compareUnsigned, divideUnsigned, and remainderUnsigned, as well as conversion methods; unsigned exponentiation is offered in Math (because signed types in Java wrap, there's no need for special unsigned addition/subtraction). |
|
|
|
| ▲ | tialaramex 8 hours ago | parent | prev | next [-] |
| > Systems programmers love to hate on unsigned integers I don't see this hate in Rust. I think this is a big thing in the C-related languages, and that the author has chosen to pretend that's the same for any "systems language" but it is not. |
|
| ▲ | einpoklum 14 hours ago | parent | prev [-] |
| > There are times when you really really need to use the full word range without negative values. There are a few of those, but that is the niche case. Certainly when we're talking about 64-bit size types. And if you want to cater to smaller size types, then just just template over the size type. Or, OK, some other trick if it's C rather than C++. |
| |
| ▲ | pixelesque 13 hours ago | parent [-] | | Sometimes (and very often in some scenarios/industries, i.e. HPC for graphics and simulation with indices for things like points, vertices, primvars, voxels, etc) you want pretty good efficiency of the size of the datatype as well for memory / cache performance reasons, because you're storing millions of them, and need to be random addressing (so can't really bit-pack to say 36 bytes, at least without overhead away from native types, which are really needed for maximum speed without any branching). Losing half the range to make them signed when you only care about positive values 95% of the time (and in the rare case when you do any modulo on top of them you can cast, or write wrappers for that), is just a bad trade-off. Yes, you've still then only doubled the range to 2^32, and you'll still hit it at some point, but that extra byte can make a lot of difference from a memory/cache efficiency standpoint without jumping to 64-bit. So very often uint32_t is a very good sweet spot for size: int32_t is sometimes too small, and (u)int64_t is generally not needed and too wasteful. | | |
| ▲ | marshray 12 hours ago | parent | next [-] | | As I generally believed in Moore's law, i.e., accepted the notion that transistors were exponential, I was surprised at how long the difference between a 2 GiB address space and a 3 GiB address space was relevant in practice. In theory, it should have been at most a year. In practice, Windows XP /3GB boot switch (allocates 3 GB of virtual address space user mode and 1 GiB for the kernel instead of the usual 2 and 2) was relevant for many years. | | |
| ▲ | tialaramex 7 hours ago | parent [-] | | If 64-bit was an easy option for you, the transition wasn't after the /3GB switch, it typically happened at about 1GB RAM and yeah, it wasn't very long as you imagined because of Moore's law. So that /3GB switch is for people who are stuck on the wrong hardware for a variety of reasons, and the timing is about how long those people stayed trapped rather than how long before this became a bad idea (it was a bad idea before it even shipped, but it was necessary) Linux had some more extreme splits including a 3.5:0.5 split and a nasty 4:4 split (in which all the userspace addresses are invalidated when in kernel space, ugh) and it's for the same reason, these aren't customers who chose not to go to 64-bit, they're customers who can't yet and will pay $$$$ to keep what they are doing working for just a while longer anyway despite that. |
| |
| ▲ | einpoklum 10 hours ago | parent | prev [-] | | > HPC for graphics and simulation with indices Those are not sizes of data structures. > Losing half the range It's not a part of the range of sizes they can use, with any typical data structure. > Losing half the range to make them signed when you only care about positive values 95% of the time is just a bad trade-off. It's the right choice sizes in the standard library (in C++) or standard-ish/popular libraries in C. And - again, it's the wrong type. For example, even if you only care about positive values, their difference is not necessarily positive. |
|
|