Remix.run Logo
bobmcnamara 5 days ago

Intel and AMD I'd reckon. Apple went wide with their busses.

to11mtm 5 days ago | parent [-]

Well, Each Channel needs a lot of pins. I don't think all 288/262 pins need to go to the CPU, but a large number of them do, I'd wager; The old LGA 1366 (Tri-Channel) and LGA 1151 (Dual Channel) are probably as close as we can get to a simple reference point [0].

Apple FBOW, based on a quick and sloppy count of a reballing jig [1], has something on the order of 2500-2700 balls on an M2 CPU.

I think AMD's FP11 'socket' (it's really just a standard ball grid array) pinout is something on the order of 2000-2100 balls and that gets you four 64 Bit DDR channels (I think Apple works a bit different and uses 16 bit channels, thus the 'channel count' for an M2 is higher.)

Which is a roundabout way of saying, AMD and Intel probably can match the bandwidth but to do so likely would require moving to soldered CPUS which would be a huge paradigm shift for all the existing boardmakers/etc.

[0] - They do have other tradeoffs; namely that 1151 has built in PCIE, on the other hand the link to the PCH is AFAIR a good bit thinner than the QPI link on the 1366.

[1] - https://www.masterliuonline.com/products/a2179-a1932-cpu-reb... . I counted ~55 rows along the top and ~48 rows on the side...

bobmcnamara 4 days ago | parent [-]

Completely agree, and this is a bit of a ramble...

I think part of might be that Apple recognized that integrated GPUs require a lot of bulk memory bandwidth. I noticed this with their tablet derivative cores having memory bandwidth that tended to scale with screen size but Samsung and Qualcomm didn't bother for ages. And it sucked doing high speed vision systems on their chips because of it.

For years Intel had been slowly beefing up the L2/L3/L4.

M1Max is somewhere between Nvidia 1080 and 1080TI in bulk bandwidth. The lowest end M chips aren't competitive, but near everything above that overlaps even the current gen NVIDA 4050+ offerings

to11mtm 4 days ago | parent [-]

Good ramble though :)

Yeah, Apple definitely realized that they should do something and for as much as I don't care for their ecosystem I think they were very smart in how they handled the need for memory bandwidth, e.x. having more 16 bit channels vs fewer 64 bit channels probably allows for better power management characteristics as far as being able to relocating data on 'sleep'/'wake' and thus being able to leave more of the ram powered off.

That plus the good UMA impl has left the rest of the industry 'not playing catchup' i.e.

- Intel failing to capitalize on the opportunity of a 'VRAM heavy' low end card to gain market share,

- AMD failing to bite the bullet and meaningfully try to fight Nvidia on memory/bandwidth margin...

- Nvidia just raking that margin in...

- By this point you'd think Qualcomm would just do an 'AI Accelerator' reference platform just to try....

- I'm guessing whatever efforts are happening in China, they are too busy trying to fill internal needs to bother boasting and tipping their hat; better to let outside companies continue to overspend on the current paradigm.