▲ | somat 6 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Nintendo 64 RDP(graphics/memory controller) used 9 bit bytes. This was done for graphics reasons, native antialiasing if I understand it. The cpu can't use it. it still only sees 8-bit bytes. https://www.youtube.com/watch?v=DotEVFFv-tk (Kaze Emanuar - The Nintendo 64 has more RAM than you think) To summarize the relevant part of the video. The RDP wants to store pixel color in 18 bits 5 bits red 5 bits blue 5 bits green 3 bits triangle coverage it then uses this coverage information to calculate a primitive but fast antialiasing. so SGI went with two 9-bit bytes for each pixel and magic in the RDP(remember it's also the memory controller) so the cpu sees the 8-bit bytes it expects. Memory on N64 is very weird it is basicly the same idea as PCIE but for the main memory. PCI big fat bus that is hard to speed up. PCIE small narrow super fast bus. So the cpu was clocked at 93 MHz but the memory was a 9-bit bus clocked at 250 MHz. They were hoping this super fast narrow memory would be enough for everyone but having the graphics card also be the memory controller proved to make the graphics very sensitive to memory load. to the point that the main thing that helps a n64 game get higher frame rate is to have the cpu do as few memory lookups as possible. which in practical terms means having it idle as much as possible. This has a strange side effect that while a common optimizing operation for most architectures is to trade calculation for memory(unroll loops, lookup tables...) on the N64 it can be the opposite. If you can make your code do more calculation with less memory you can utilize the cpu better because it is mostly sitting idle to give the RDP most of the memory bandwidth. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | fc417fc802 6 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> a common optimizing operation for most architectures is to trade calculation for memory(unroll loops, lookup tables...) That really depends. A cache miss adds eons of latency thus is far worse than doing a few extra cycles of work but depending on the workload the reorder buffer might manage to negate the negative impact entirely. Memory bandwidth as a whole is also incredibly scarce relative to CPU clock cycles. The only time it's a sure win is if you trade instruction count for data in registers or L1 cache hits but those are themselves very scarce resources. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | 01HNNWZ0MV43FF 6 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah but if the CPU can't use it then it's kinda like saying your computer has 1,000 cores, except they're in the GPU and can't run general-purpose branchy code In fact, it's not even useful to say it's a "64-bit system" just because it has some 64-bit registers. It doesn't address more than 4 GB of anything ever | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|