Would this matter for performance? You already have so many execution units that are actually difficult to keep fully fed even when decoding instructions and data at the speed of cache.