| ▲ | Onavo 5 days ago |
| What do you mean by collision? |
|
| ▲ | qrios 5 days ago | parent | next [-] |
| If multiple cores tries to get the same memory addresses, the MMU feeds only one core, the second one have to whait. Depends on the type of RAM, this will cost a lot of cycles. GPU MMUs can handle multiple line in parallel. But not 10k cores at the same time. The HBM is not able to transfer 3.5TByte sequencial. |
| |
| ▲ | whatshisface 5 days ago | parent [-] | | Why is that? It seems like multiple cores requesting the same address would be easier for the MMU to fetch for, not harder. | | |
| ▲ | recursivecaveat 5 days ago | parent | next [-] | | Not necessarily the exact same address (you can fix that in a program anyways with a broadcast tree), but same memory bank. Imagine 1000 trains leaving a small town at the same time, instead of 1000 trains leaving 1000 different towns simultaneously. At some point there are not enough transportation resources to get stuff out of a particular area at the parallelism desired. | |
| ▲ | reliabilityguy 5 days ago | parent | prev | next [-] | | It’s not that the fetching is the problem, but serving the data to many cores at the same time from a single source. | | |
| ▲ | supersour 5 days ago | parent [-] | | I'm not familiar with GPU architecture, is there not a shared L2/L3 data cache from which this data would be shared? | | |
| ▲ | reliabilityguy 4 days ago | parent [-] | | MMU has a finite amount of ports that drive the data to the consumers. An extreme case: all 32 cores want the same piece of data at the same time. |
|
| |
| ▲ | qrios 5 days ago | parent | prev [-] | | This is not my domain, but I assume the MMUs acting like a switch and something like multicast is not available here. I‘ve tried to implement such on a FPGA and it was extremely cost intensiv. |
|
|
|
| ▲ | agf 5 days ago | parent | prev [-] |
| I believe it's that the bus can only serve one chip at a time, so it has to actually be faster since sometimes one chip's data will have to wait for the data of another chip to finish first. |