I understand why a non-standard compiler-specific implementation of int128 was not used (Besides being compiler specific, the point of the article is to walk through an implementation of it), but why use

> using u64 = unsigned long long;

? Although in practice, this is _usually_ an unsigned 64 bit integer, the C++ Standard does not technically guarantee this, all it says is that the type need to be _at least_ 64 bits. [0]

I would use std::uint64_t which guarantees a type of that size, provided it is supported. [1]

Re: Multiplication: regrouping our u64 digits

I am aware more advanced and faster algorithms exist, but I wonder if something simple like Karatsuba's Algorithm [2] which uses 3 multiplications instead of 4, could be a quick win for performance over the naive method used in the article. Though since it was mentioned that the compiler-specific unsigned 128 integers more closely resembles the ones created in the article, I suppose there must be a reason for that method to be used instead, or something I missed that makes this method unsuitable here.

Speaking of which, I would be interested to see how all these operations fair against compiler-specific implementations (as well as the comparisons between different compilers). [3]. The article only briefly mentioned their multiplication method is similar for the builtin `__uint128_t` [4], but did not go into detail or mention similarities/differences with their implementation of the other arithmetic operations.

[0] https://en.cppreference.com/w/cpp/language/types.html The official standard needs to be purchased, which is why I did not reference that. But it should be under the section basic.fundamental

[1] https://en.cppreference.com/w/cpp/types/integer.html

[2] https://en.wikipedia.org/wiki/Karatsuba_algorithm

[3] I suppose I could see for myself using godbolt, but I would like to see some commentary/discussion on this.

[4] And did not state for which compiler, though by context, I suppose it would be MSVC?

▲

sparkie 4 hours ago | parent | next [-]

> I would use std::uint64_t which guarantees a type of that size, provided it is supported.

The comment on the typedef points out that the signature of intrinsics uses `unsigned long long`, though he incorrectly states that `uint64_t` is `unsigned long` - which isn't true, as long is only guaranteed to be at least 32-bits and at least as large as `int`. In ILP64 and LLP64 for example, `long` is only 32-bits.

I don't think this really matters anyway. `long long` is 64-bits on pretty much everything that matters, and he is using architecture-specific intrinsics in the code so it is not going to be portable anyway.

If some future arch had 128-bit hardware integers and a data model where `long long` is 128-bits, we wouldn't need this code at all, as we would just use the hardware support for 128-bits.

But I agree that `uint64_t` is the correct type to use for the definition of `u128`, if we wanted to guarantee it occupies the same storage. The width-specific intrinsics should also use this type.

> I would be interested to see how all these operations fair against compiler-specific implementations

There's a godbolt link at the top of the article which has the comparison. The resulting assembly is basically equivalent to the built-in support.

▲

Joker_vD 6 hours ago | parent | prev [-]

Since they don't calculate the upper 128-bits of the product, they use only 3 multiplications anyway.

	▲	b1temy 6 hours ago \| parent [-]
		You are right. Not sure how I missed/forgot that. In fact, I think the entire reason I was reminded of the algorithm was because I saw the words "3 multiplications" in the article in the first place. Perhaps I need more coffee...