▲ | mohinder 3 days ago | |
You don't need all the permutations. If there are 32 bytes in a cache line then each instruction can only start at one of 32 possible positions. Then if you want to decode N instructions per cycle you need N 32-to-1 muxes. You can reduce the number of inputs to the later muxes since instructions can't be zero size. | ||
▲ | monocasa 3 days ago | parent [-] | |
It was even simpler until very recently where the decode stage would only look at a max 16 byte floating window. |