| ▲ | purplesyringa 4 hours ago | |
The paper doesn't require a bitshift after multiplication -- it directly uses the high half of the product as the quotient, so it saves at least one tick over the solution you mentioned. And on x86, saturating addition can't be done in a tick and 32->64 zero-extension is implicit, so the distinction is even wider. | ||
| ▲ | aleph_minus_one 32 minutes ago | parent [-] | |
> And on x86, saturating addition can't be done in a tick Perhaps I misunderstand your point, but I am rather sure that in SSE.../AVX... there do exist instructions for saturating addition: * (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW * (V)PHADDSW, (V)PHSUBSW | ||