While reading this article it occurred to me to wonder if the CPU CRC32C instruction would be good for hash functions; I think the latency is about the same as an integer multiply.