Remix.run Logo
aurareturn 4 days ago

Neural Engine is optimized for power efficiency, not performance.

Look for Apple to add matmul acceleration into the GPU instead. Thats how to truly speed up local LLMs.