| ▲ | ismailmaj 2 hours ago | |
Assuming 2 bit per values (first bit is sign and second bit is value). actv = A[_:1] & B[_:1] sign = A[_:0] ^ B[_:0] dot = pop_count(actv & !sign) - pop_count(actv & sign) It can probably be made more efficient by taking a column-first format. Since we are in CPU land, we mostly deal with dot products that match the cache size, I don't assume we have a tiled matmul instruction which is unlikely to support this weird 1-bit format. | ||