Remix.run Logo
fluoridation 3 hours ago

In that case just use CTR mode, no?

tptacek 17 minutes ago | parent | next [-]

https://www.cs.ucdavis.edu/~rogaway/papers/thorp.pdf

(Not that this is the only solution but that it motivates the problem of why you can't just naively apply AES to the problem).

201984 2 hours ago | parent | prev | next [-]

In the context of encrypting 32 or 64 bit IDs, where there is no nonce, that'd be equivalent to XOR encryption and much weaker than TFA's small block ciphers.

adrian_b an hour ago | parent | next [-]

If you really want to encrypt and decrypt 32-bit numbers without having any nonces available, the fastest way on non-microcontroller CPUs remains using the AES instructions.

You can exploit the fact that the core of AES consists of 32-bit invertible mixing functions. In order to extend AES to 128-bit, a byte permutation is used, which mixes the bytes of the 32-bit words.

The AES instructions are such, that you can cancel the byte permutation. In this case, you can use the AES instructions to encrypt separately four 32-bit words, instead of one 128-bit block.

Similarly by canceling the standard byte permutation and replacing it with separate permutations on the 2 halves, you can make the AES instructions independently encrypt two 64-bit words.

These AES modifications remain faster than any software cipher.

How to cancel the internal permutation and replace it with external shuffle instructions was already described in the Intel white paper published in 2010, at the launch of Westmere, the first CPU with AES instructions.

201984 18 minutes ago | parent [-]

Are you certain using AES is still faster? Let's say for a 32-bit block size and 64-bit key.

From https://en.wikipedia.org/wiki/Speck_(cipher), that Speck combination would use 22 rounds, and using the instruction timings for Zen 5 from https://instlatx64.github.io/InstLatx64/AuthenticAMD/Authent..., it looks like each round would take at most 3 cycles. (Dependency chain for each round is 3 instructions long, ror+add+xor). 22*3 = ~66 cycles.

Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.

It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.

botusaurus 6 minutes ago | parent [-]

maybe the reason they are so close is that the AES microcode is inplementing exactly those operations

fluoridation an hour ago | parent | prev [-]

Would it, though? Either way you're operating in ECB mode with 2^32 or 2^64 values. Why is one more secure than the other?

EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).

201984 12 minutes ago | parent [-]

>EDIT: What I mean is you can do cypher = truncate(plain ^ AES(zero_extend(plain))).

How would you decrypt that though? You truncated 3/4ths of the AES output needed to decrypt it.

I thought you were suggesting this:

  ciphertext = truncate(AES(key) ^ plaintext)
And in this case, since AES(key) does not depend on the plaintext, it would just be XOR by a constant.
Joker_vD 3 hours ago | parent | prev [-]

Some people just itch to use something custom and then to have to think about it. Which can bring amazing results, sure, but it can also bring spectacular disasters as well, especially when we're talking about crypto.

giantrobot 3 hours ago | parent [-]

The article is less about crypto and more about improving UUID (and IDs in general) with small block ciphers. It's a low impact mechanism to avoid leaking data that UUID by design does leak. It also doesn't require a good source of entropy.