Interesting read, even as someone who isn't using Zig.

I wonder, these arbitrary-width integers... Is it actually even really worth it? My intuition is to prefer manually packing/unpacking things instead (in any language, even C that has bit width for struct fields), because it gives me a better mental picture of the code that is actually generated. Particularly for something like an signed odd-bit integer - what kind of code gets generated for sign-extension, a presumably common operation?

Does anybody have other experiences with them, one way or the other?

▲

hansvm 3 hours ago | parent | next [-]

IIRC, for "normal" bit widths the codegen basically uses the next larger machine type and preserves zero bits on the high end. An i3 is an i8 with five MSB zeroes (with more custom behavior for "packed" i3 values). It's UB to fill those with non-zero values. For larger bit widths, like u729, you concatenate many large machine types, the compiler generates instructions in an unrolled loop, and the LLVM optimization pass usually doesn't clean that up (though, now that integers are apparently not using the LLVM u729 implementation, perhaps there are some more optimization opportunities).

They're situationally useful, especially when performance isn't an enormous concern. That u729 example above came from a variant sudoku solver I wrote to aid developing new puzzles (easy to check the rough magnitude of the solution space for whatever idea I was mulling over and examine how restricted the board actually was -- just an intermediate step in puzzle design). It's not optimal (hard on the icache, can be hard on registers, other issues abound), but it's dead simple to use, and the assembly isn't terrible, beating all the normal solvers I saw floating around. It's a nice point on the laziness/correctness/good-enough-perf pareto curve.

Another comment mentioned this, but they're great in packed structs for representing weird numeric entities (I think I have a logarithmic number system floating around which does that).

One thing the language does quite a lot is use them to guard against certain classes of human error at compile time. It doesn't perfectly make impossible actions unrepresentable, but shoving a full u32 into a shift argument usually doesn't make sense, so the types are constrained to be smaller.

▲

nvme0n1p1 3 hours ago | parent [-]

I can't imagine any situation where I'd use a u729 instead of a StaticBitSet. For size 729, it would end up backed by a bit_set.Array, not a bit_set.Integer.

https://ziglang.org/documentation/master/std/#std.bit_set.St...

▲

AlotOfReading 2 hours ago | parent | next [-]

I don't program zig, so it's not clear to me if you can use zig's bitsets arithmetically.

Sometimes it's just more clear to work with integers than other representations. Most situations with a state space of N bits have meaningful integer representations, where arithmetic functions on those representations are also meaningful.

For example, CRCs can be written as the remainder from long division of the message by the polynomial. Defining nontrivial cyclic permutations is also much more straightforward as functions on integers than on bitsets.

	▲	nvme0n1p1 2 hours ago \| parent [-]
		For other situations like a CRC on an arbitrarily-sized message, a big int would be better, surely? You can do long division on those. https://ziglang.org/documentation/0.16.0/std/#std.math.big.i... I was talking about GP's u729, which is 999, the state space of a sudoku board. Can you come up with a situation where dividing that number by anything is meaningful?

▲

hansvm 2 hours ago | parent | prev [-]

Old habits :)

If I had to steel-man the idea, I'm pretty sure the integer-based solution has better codegen with many kinds of sparse, comptime-known masks. I think you're right though, StaticBitSet looks better.

	▲	nvme0n1p1 2 hours ago \| parent [-]
		For your specific case, even a simple `[9][9]u16` might perform better (where you make use of nine bits in each u16). For each entry, the nine mask bits would be in the same bit positions, so the compiler won't have to do a bunch of shifts to extract/align the bits. CPUs love consistency. I doubt it's worth the additional codegen complexity to save 70 bytes in your data model.

▲

flohofwoe an hour ago | parent | prev | next [-]

It's pretty great in my toy emulator project (https://github.com/floooh/chipz) as 'system bus' where each bit is a 'wire' which is then mapped to chip input/output pins.

The bus-width is a generic parameter and can be below or above 64 bits (depending on the emulated system). With arbitrary-width integers the high level code remains the same no matter what the bus-width is, and from looking at the compiler output, as long as bit operations don't straddle the underlying 64-bit integer boundary, those bit operations are just as efficient as working on a simple 64-bit int.

Also AFAIK LLVM supports arbitrary-width integers since pretty much forever, Zig just 'exposed' them in the language (as later did Clang via _ExtInt(N), which is now deprecated in favour of C23's _BitInt(N)).

The other nice usage (also in emulators) is for chip registers and counters, those often have odd widths (like 5 bits), and writing those as u5 instead of u8 in the code is just nicer since it matches the chip documentation, and when reading the code it's immediately clear that this u5 is a 5-bit counter or register.

▲

ismailmaj 4 hours ago | parent | prev | next [-]

It's great for defining fancy floats used in machine learning

e.g. https://github.com/zml/zml/blob/33ced8fa078b3c7c8c709bd526ae...

▲

y1n0 3 hours ago | parent | prev [-]

As an fpga engineer dealing with bitwidths that are non-byte multiples is very normal and when I end up writing software for various reasons, I often miss it. Usually when trying to slice and parse or construct messages.

Obviously there are ways around pretty much everything, but it’s nice to have first class language support for bit slices.

▲

NooneAtAll3 3 hours ago | parent [-]

except it isn't bit slice, it isn't indexing within a range - it's just integer type that only allows values up to 2^width, with same alignment rounding up as with the rest

	▲	hmry 3 hours ago \| parent [-]
		It's a bit slice if you put it in a packed struct. I like them, they're nicer than C's bitfields: The order isn't implementation-defined, and the types remember their range rather than being converted to a power-of-two size upon read. (Maybe that's possible with C23 _BitInt(n), I haven't tried if those work in bitfields)