201984 2 hours ago

In the context of encrypting 32 or 64 bit IDs, where there is no nonce, that'd be equivalent to XOR encryption and much weaker than TFA's small block ciphers.

▲ adrian_b an hour ago | parent | next [-]

If you really want to encrypt and decrypt 32-bit numbers without having any nonces available, the fastest way on non-microcontroller CPUs remains using the AES instructions.

You can exploit the fact that the core of AES consists of 32-bit invertible mixing functions. In order to extend AES to 128-bit, a byte permutation is used, which mixes the bytes of the 32-bit words.

The AES instructions are such, that you can cancel the byte permutation. In this case, you can use the AES instructions to encrypt separately four 32-bit words, instead of one 128-bit block.

Similarly by canceling the standard byte permutation and replacing it with separate permutations on the 2 halves, you can make the AES instructions independently encrypt two 64-bit words.

These AES modifications remain faster than any software cipher.

How to cancel the internal permutation and replace it with external shuffle instructions was already described in the Intel white paper published in 2010, at the launch of Westmere, the first CPU with AES instructions.

▲

201984 18 minutes ago | parent [-]

Are you certain using AES is still faster? Let's say for a 32-bit block size and 64-bit key.

From https://en.wikipedia.org/wiki/Speck_(cipher), that Speck combination would use 22 rounds, and using the instruction timings for Zen 5 from https://instlatx64.github.io/InstLatx64/AuthenticAMD/Authent..., it looks like each round would take at most 3 cycles. (Dependency chain for each round is 3 instructions long, ror+add+xor). 22*3 = ~66 cycles.

Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.

It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.

	▲	botusaurus 5 minutes ago \| parent [-]
		maybe the reason they are so close is that the AES microcode is inplementing exactly those operations

▲ fluoridation an hour ago | parent | prev [-]

Would it, though? Either way you're operating in ECB mode with 2^32 or 2^64 values. Why is one more secure than the other?

EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).

	▲	201984 11 minutes ago \| parent [-]
		>EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))). How would you decrypt that though? You truncated 3/4ths of the AES output needed to decrypt it. I thought you were suggesting this: `ciphertext = truncate(AES(key) ^ plaintext)` And in this case, since AES(key) does not depend on the plaintext, it would just be XOR by a constant.