Remix.run Logo
kamranjon 3 hours ago

Benchmarks are here: https://kyuz0.github.io/amd-strix-halo-vllm-toolboxes/

Would love to see DeepSeek V4 flash/pro and MiniMax M3 benchmarks but already these are pretty impressive, first strix Halo setup I've seen with some serious performance.

EDIT: Apologies - I think I misunderstood these benchmarks - it seems this is actually very slow when compared to a M4 or M5 chip with a good amount of memory. Looking at the creators video here: https://youtu.be/Cfl3TS7ME5s?t=734 -- it seems the performance of strix halo is much much slower than I get on my M4 MBP - which gets ~400 prefill and ~20 tok/s generation

Tepix 2 hours ago | parent [-]

The pp speeds are really slow (50), I think there‘s room for improvement still.

kamranjon 2 hours ago | parent [-]

Ah yea after watching one of the creators youtube videos I realize these benchmarks are combining prefill and decode which isn't super helpful - it seems this struggles with the exact same bottlenecks as all strix halo setups, memory bandwidth. It seems this is still significantly slower than equivalent memory sizing on Mac hardware.