It's just a name. I'm sure this is all pretty iterative work.

UDNA isn't a name but instead a big shift in strategy.

CDNA was for HPC / Supercomputers and Data center. GCN always was a better architecture than RDNA for that.

RDNA itself was trying to be more NVidia like. Fewer FLOPs but better latency.

Someone is getting the axe. Only one of these architectures will win out in the long run, and the teams will also converge allowing AMD to consolidate engineers to improving the same architecture.

We won't know what the consolidated team will release yet. But it's a big organizational shift that surely will affect AMDs architectural decisions.

▲

timschmidt 2 days ago | parent [-]

My understanding was that CDNA and RDNA shared much if not most of their underlying architecture, and that the fundamental differences had more to do with CDNA supporting a greater variety of numeric representations to aid in scientific computing. Whereas RDNA really only needed fp32 for games.

	▲	DiabloD3 7 hours ago \| parent \| next [-]
		That's not entirely wrong. https://gpuopen.com/download/RDNA_Architecture_public.pdf I've been showing this one to people for a few years as a good introduction on how RDNA diverged from GCN->CDNA. The main thing they did was change where wavefront steps (essentially, quasi-VLIW packets) execute: instead of being at the head of the pipeline (which owns 4x SIMD16 ALUs = 64 items) and requires executing 64 threads concurrently (thus, 64x registers/LDS/etc space), it issues non-blocking segments of the packet into per-ALU sub-pipelines, requiring far fewer concurrent threads to maintain peak performance (and, in many cases, far less concurrent registers used for intermediates that don't leave the packet). GCN is optimized for low instruction parallelism but high parallelism workloads. Nvidia since the dawn of their current architecture family tree has been optimized for high instruction parallelism but not simple highly parallel workloads. RDNA is optimized to handle both GCN-optimal and NVidia-optimal cases. RDNA, since this document has been written, also has been removing all the roadblocks to improve performance on this fundamental difference. RDNA4, the one that just came out, increased the packet processing queue to be able to schedule more packets in parallel and more segments of the packets into their per-ALU slots, is probably the most influential change: in software that performed bad on all GPUs (GCN, previous RDNA, anything Nvidia), a 9070XT can perform like a 7900XTX with 2/3rds the watts and 2/3rds the dollars. While CDNA has been blow for blow against Nvidia's offerings since it's name change, RDNA has eradicated the gap in gaming performance. Nvidia functionally doesn't have a desktop product below a 5090 now, and early series 60 rumors aren't spicy enough to make me think Nvidia has an answer in the future, either.
	▲	dragontamer 2 days ago \| parent \| prev \| next [-]
		Who told ya that?? CDNA is 64 wide per work item. And CDNA1 I believe was even 16 lanes executed over 4 clock ticks repeatedly (ie: minimum latency of all operations, even add or xor, was 4 clock ticks). It looks like CDNA3 might not do that anymore but that's still a lot of differences... RDNA actually executes 32-at-a-time and per clock tick. It's a grossly different architecture. That doesn't even get to Infinity Cache, 64-bit support, AI instructions, Raytracing, or any of the other differences....
	▲	sharpneli 2 days ago \| parent \| prev [-]
		CDNA is based on the older gcn arch so they share the same as pre RDNA ones and RDNA ones.