I am not sure you set it up right. Did you have a runnable WolframLanguage file so it can compare results? Did you give it H100 / H200 access to compile and then iterate?

My experience is that once you have these two, it does amazing kernel work (Codex-5.4).

▲

acuozzo 4 hours ago | parent [-]

> Did you have a runnable WolframLanguage file so it can compare results?

Yes.

> Did you give it H100 / H200 access to compile and then iterate?

Yes via Lambda.ai. Also, FWIW, I run claude with --dangerously-skip-permissions and codex with the equivalent flag.

> it does amazing kernel work (Codex-5.4)

Specifically with WGMMA + TMA?

---

Once TMA gets involved both Claude and Codex spin endlessly until they dump TMA for a slower fallback.

I've observed this with Claude-Code having Opus 4.6 reasoning set to medium, high, and max; "adaptive thinking" enabled and disabled; and I've made sure to max-out thinking tokens.

I've also observed this with Codex GPT-5.4 in addition to GPT-5.3-Codex with reasoning efforts from medium to xhigh.

---

I've also observed this on the web, as mentioned in my OP, with GPT-5.4pro (Extended Pro), Gemini3-DeepThink, and Opus 4.6.

	▲	liuliu an hour ago \| parent [-]
		That is informative, thanks! Yes, I observe the same thing as the model tends to give up (like you said, "dump TMA for a slower fallback") and needs active steering to get good results. But it indeed works further than one-shot from Chat interface and knows much more about profiling / kernel coding than these.