Question for those smarter than me: What is an application for an int128 type anyways? I've never personally needed it, and I laughed at RISC-V for emphasizing that early on rather than... standardizing packed SIMD.

▲ eisenwave an hour ago | parent | next [-]

In 2024, I've published a C++ proposal for a 128-bit integer type: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p31...

You can find a lot of motivation for 128-bit integers in that paper, such as fixed-point operations, implementing 128-bit (decimal) floating-point, financial calculations, cryptography, etc. However, the proposal has been superseded by P3666, which aims to bring C23's _BitInt type to C++, which wouldn't just allow for 128-bit integers (as _BitInt(128)) but for any other width as well.

▲ PaulDavisThe1st 4 hours ago | parent | prev | next [-]

  int64_t a, b, c, r;

  r = (a * b) / c; /* multiplication step could overflow so use 128bits */

▲ cmovq 2 hours ago | parent [-]

Last time I checked LLVM had surprisingly bad codegen for this using int128. On x86 you only need two instructions:

    __asm (
        "mulq %[multiplier]\n"
        "divq %[divisor]\n"
        : "=a"(result)
        : "a"(num), [multiplier]"r"(multiplier), [divisor]"r"(divisor)
        : "rdx"
    );

The intermediate 128bit number is in rdx:rax.

	▲	bonzini an hour ago \| parent [-]
		That only works if you are sure to have a 64-bit result. If you can have divisor < multiplier and need to detect overflow, it's more complicated.

▲ sparkie 5 hours ago | parent | prev | next [-]

Cryptography would be one application. Many crypto libraries use an arbitrary size `bigint` type, but the algorithms typically use modular arithmetic on some fixed width types (128-bit, 256-bit, 512-bit, or some in-between like 384-bits).

They're typically implemented with arrays of 64-bit or 32-bit unsigned integers, but if 128-bits were available in hardware, we could get a performance boost. Any arbitrary precision integer library would benefit from 128-bit hardware integers.

▲

ThatGuyRaion 5 hours ago | parent [-]

I suppose that makes sense -- though SIMD seems more useful for accelerating a lot of crypto?

▲

sparkie 4 hours ago | parent [-]

SIMD is for performing parallel operations on many smaller types. It can help with some cryptography, but It doesn't necessarily help when performing single arithmetic operations on larger types. Though it does help when performing logic and shift operations on larger types.

If we were performing 128-bit arithmetic in parallel over many values, then a SIMD implementation may help, but without a SIMD equivalent of `addcarry`, there's a limit to how much it can help.

Something like this could potentially be added to AVX-512 for example by utilizing the `k` mask registers for the carries.

The best we have currently is `adcx` and `adox` which let us use two interleaved addcarry chains, where one utilizes the carry flag and the other utilizes the overflow flag, which improves ILP. These instructions are quite niche but are used in bigint libraries to improve performance.

	▲	wahern an hour ago \| parent [-]
		> but It doesn't necessarily help when performing single arithmetic operations on larger types. For the curious, AFAIU the problem is the dependency chains. For example, for simple bignum addition you can't just naively perform all the adds on each limb in parallel and then apply the carries in parallel; the addition of each limb depends on the carries from the previous limbs. Working around these issues with masking and other tricks typically ends up adding too many additional operations, resulting in lower throughput than non-SIMD approaches. There's quite a few papers on using SIMD to accelerate bignum arithmetic for single operations, but they all seem quite complicated and heavily qualified. The threshold for eeking out any gain is quite high, e.g. minimum 512-bit numbers or much greater, depending. And they tend to target complex or specialized operations (not straight addition, multiplication, etc) where clever algebraic rearrangements can profitably reorder dependency chains for SIMD specifically.

▲ cornstalks 4 hours ago | parent | prev | next [-]

I implemented a rational number library for media timestamps (think CMTime, AVRational, etc.) that uses 64-bit numerators and denominators. It uses 128-bit integers for intermediate operations when adding, subtracting, multiplying, etc. It even uses 128-bit floats (represented as 2 doubles and using double-double arithmetic[1]) for some approximation operations and even 192-bit integers in one spot (IIRC it's multiplying a 128-bit and 64-bit ints and I just want the high bits so it shifts back down to 128 bits immediately after the multiplication).

I keep meaning to see if work will let me open source it.

[1]: https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...

▲ 3 hours ago | parent | prev | next [-]

[deleted]

▲ fluoridation 5 hours ago | parent | prev | next [-]

The last time I used one I wanted UNIX timestamps + fractional seconds. Since there was no difference between adding 1 bit or 64, I just gave it 32 bits for the fraction and 32 more bits for the integral part.

▲ rurban 2 hours ago | parent | prev | next [-]

Any application which uses arithmetic on 64bit ints, because most operations can overflow. And most libs/compilers don't check for overflows.

▲ green7ea 3 hours ago | parent | prev | next [-]

I made a time sync library over local network that had to be more precise than NTP and used i128 to make sure the i64 math I was doing couldn't overflow.

I32 didn't cover enough time span and f64 has edge cases from the nature of floats. This was for Windows (MACC not GCC) so I had to roll out my own i128.

▲ adgjlsfhk1 5 hours ago | parent | prev | next [-]

It's used fairly frequently (e.g. in turning 64 bit division into multiplication and shifts).

▲ 6 hours ago | parent | prev | next [-]

[deleted]

▲ bandrami 5 hours ago | parent | prev | next [-]

It's an opaque way to hold a GUID or an IP6 address

▲ bsder 4 hours ago | parent | prev [-]

Intersection calculations from computational geometry. Intersection calculations generally require about 2*n+log2(n) bits.

If you like your CAD accurate, you have to operate in integer space.