Remix.run Logo
haellsigh 2 hours ago

Fyi, I believe `--flash-attn on` doesn't do anything, you should instead use `--flash-attn 1`. I'm getting ~150t/s on a RTX 3080 10GB as well with f16 cache type.

freakynit an hour ago | parent [-]

Thanks.. updated my local docs :)