▲ | steinvakt2 9 days ago | |
And flash attention doesn't work on 5090 yet, right? So currently 4090 is probably faster, or? | ||
▲ | PeterStuer 9 days ago | parent | next [-] | |
I don't think the 4090 has native 4bit support, which will probably have a significant impact. | ||
▲ | diggan 9 days ago | parent | prev [-] | |
> And flash attention doesn't work on 5090 yet, right? Flash attention works with GPT-OSS + llama.cpp (tested on 1d72c8418) and other Blackwell card (RTX Pro 6000) so I think it should work on 5090 as well, it's the same architecture after all. |