Remix.run Logo
camel-cdr a day ago

> How do these fare in terms of absolute performance? The NEC TSUBASA is not a CPU.

The NEC is an attached accelerator, but IIRC it can run an OS in host mode. It's hard to tell how the others perform, because most don't have hardware available yet or only they and partner companies have access. It's also hard to compare, because they don't target the desktop market.

> I ported some numeric simulation kernel to the A64Fx some time ago, fixing the vector width gave a 2x improvement.

Oh, wow. Was this autovectorized or handwritten intrinsics/assembly?

Any chance it's of a small enough scope that I could try to recreate it?

> I was specifically referring to dynamic vector sizes.

Ah, sorry, yes you are correct. It still shows that supporting VLA mechanisms in an ISA doesn't mean it's slower for fixed-size usage.

I'm not aware of any proper VLA vs VLS comparisons. I benchmarked a VLA vs VLS mandelbrot implementation once where there was no performance difference, but that's a too simple example.