▲ | camel-cdr 3 days ago | |
> trace caches They don't anymore they have uop caches, but trace caches are great and apple uses them [1]. They allow you to collapse taken branches into a single fetch. Which is extreamly important, because the average instructions/taken-branch is about 10-15 [2]. With a 10 wide frontend, every second fetch would only be half utilized or worse. > extra caches This is one thing I don't understand, why not replace the L1I with the uop-cache entirely? I quite like what Ventana does with the Veyron V2/V3. [3,4] They replaced the L1I with a macro-op trace cache, which can collapse taken branches, do basic instruction fusion and more advanced fusion for hot code paths. [1] https://www.realworldtech.com/forum/?threadid=223220 [2] https://lists.riscv.org/g/tech-profiles/attachment/353/0/RIS... (page 10) | ||
▲ | adgjlsfhk1 2 days ago | parent [-] | |
you need both. Branches don't tell you "jump to this micro-op", they're "jump to this address" so you need the address numbering of a normal L1i. |