Remix.run Logo
purplesyringa 4 hours ago

The paper doesn't require a bitshift after multiplication -- it directly uses the high half of the product as the quotient, so it saves at least one tick over the solution you mentioned. And on x86, saturating addition can't be done in a tick and 32->64 zero-extension is implicit, so the distinction is even wider.

aleph_minus_one 32 minutes ago | parent [-]

> And on x86, saturating addition can't be done in a tick

Perhaps I misunderstand your point, but I am rather sure that in SSE.../AVX... there do exist instructions for saturating addition:

* (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW

* (V)PHADDSW, (V)PHSUBSW