That's not entirely wrong.
https://gpuopen.com/download/RDNA_Architecture_public.pdf
I've been showing this one to people for a few years as a good introduction on how RDNA diverged from GCN->CDNA.
The main thing they did was change where wavefront steps (essentially, quasi-VLIW packets) execute: instead of being at the head of the pipeline (which owns 4x SIMD16 ALUs = 64 items) and requires executing 64 threads concurrently (thus, 64x registers/LDS/etc space), it issues non-blocking segments of the packet into per-ALU sub-pipelines, requiring far fewer concurrent threads to maintain peak performance (and, in many cases, far less concurrent registers used for intermediates that don't leave the packet).
GCN is optimized for low instruction parallelism but high parallelism workloads. Nvidia since the dawn of their current architecture family tree has been optimized for high instruction parallelism but not simple highly parallel workloads. RDNA is optimized to handle both GCN-optimal and NVidia-optimal cases.
RDNA, since this document has been written, also has been removing all the roadblocks to improve performance on this fundamental difference. RDNA4, the one that just came out, increased the packet processing queue to be able to schedule more packets in parallel and more segments of the packets into their per-ALU slots, is probably the most influential change: in software that performed bad on all GPUs (GCN, previous RDNA, anything Nvidia), a 9070XT can perform like a 7900XTX with 2/3rds the watts and 2/3rds the dollars.
While CDNA has been blow for blow against Nvidia's offerings since it's name change, RDNA has eradicated the gap in gaming performance. Nvidia functionally doesn't have a desktop product below a 5090 now, and early series 60 rumors aren't spicy enough to make me think Nvidia has an answer in the future, either.