Stanford's CS224N told us it was about preserving identity because non-linear neural networks are bad at learning identity.