I would argue that they are not the same, but there is a symmetry between them.

The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.

Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.

▲

ogogmad 3 hours ago | parent [-]

ChaCha20 got discovered using a computer search testing out resistance to certain attacks. Hence, the architecture came first and then the parameters came next. Any link with NN gradient descent? It would likely be an abstract one.

	▲	tptacek 2 hours ago \| parent [-]
		I don't know how true this is? Salsa20 seems like pretty standard ARX design that builds a hash function in counter mode; there's a detailed paper explaining Bernstein's decisions.