| ▲ | haellsigh 2 hours ago | |
Fyi, I believe `--flash-attn on` doesn't do anything, you should instead use `--flash-attn 1`. I'm getting ~150t/s on a RTX 3080 10GB as well with f16 cache type. | ||
| ▲ | freakynit an hour ago | parent [-] | |
Thanks.. updated my local docs :) | ||