The article is literally about how rote translation of CUDA code to AMD hardware will always give sub-par performance. Even if you wrangled an AI into doing the grunt work for you, porting heavily-NV-tuned code to not-NV-hardware would still be losing strategy.

▲

measurablefunc 6 hours ago | parent [-]

The point of AI is that it is not a rote translation & 1:1 mapping.

▲

jsheard 5 hours ago | parent [-]

> Take the ROCm specification, take your CUDA codebase, let one of the agentic AIs translate it all into ROCm

...sounds like asking for a 1:1 mapping to me. If you meant asking the AI to transmute the code from NV-optimal to AMD-optimal as it goes along, you could certainly try doing that, but the idea is nothing more than AI fanfic until someone shows it actually working.

▲

measurablefunc 5 hours ago | parent [-]

Now that I have clarified the point about AI optimizing the code from CUDA to fit AMD's runtime what is your contention about the possibility of such a translation?

▲

bigyabai 5 hours ago | parent [-]

There is an old programmer's joke about writing abstractions and expecting zero-cost.

▲

measurablefunc 5 hours ago | parent [-]

How does that apply in this case? The whole point is that the agentic AI/AGI skips all the abstractions & writes optimized low-level code for each GPU vendor from a high-level specification. There are no abstractions other than whatever specifications GPU vendors provide for their hardware which are fed into the agentic AI/AGI to do the necessary work of creating low-level & optimized code for specific tasks.

	▲	2 hours ago \| parent [-]
		[deleted]