Remix.run Logo
zozbot234 4 hours ago

You could always offload some layers to the NPU for lower power use and leave the rest to the GPU. If the latter is power throttled (common for prefill, not for decode) that will be a performance improvement.