Remix.run Logo
iforgotpassword 3 days ago

What I don't get is why they don't at least assign a dev or two to make the poster child of this work: llama.cpp

It's the first thing anyone tries when trying to dabble in AI or compute on the gpu, yet it's a clusterfuck to get to work. A few blessed cards work, with proper drivers and kernel; others just crash, perform horribly slow, or output GGGGGGGGGGGGGG to every input (I'm not making this up!) Then you LOL, dump it and go buy nvidia et voila, stuff works first try.

wkat4242 3 days ago | parent [-]

It does work, I have it running on my Radeon VII Pro

Filligree 2 days ago | parent [-]

It sometimes works.

wkat4242 2 days ago | parent [-]

How so? It's rock solid for me. I use ollama but it's based on llama.cpp

It's quite fast also, probably because that card has fast HBM2 memory (it has the same memory bandwidth as a 4090). And it was really cheap as it was on deep sale as an outgoing model.

Filligree 2 days ago | parent | next [-]

"Sometimes" as in "on some cards". You're having luck with yours, but that doesn't mean it's a good place to build a community.

wkat4242 2 days ago | parent [-]

Ah I see. Yes, but you pick the card for the purpose of course. I also don't like the way they have such limited support on ROCm. But when it works it works well.

I have Nvidia cards too by the way, a 4090 and a 3060 (the latter I use for AI also, but more for Whisper because faster-whisper doesn't do ROCm right now).

halJordan 2 days ago | parent | prev [-]

Aside from the fact that gfx906 is one of the blessed architecture mentioned (so why would it not work). Like how do you look at your specific instance and then turn around and say "All of you are lying, it works perfectly." How do you square that circle in your head

wkat4242 2 days ago | parent [-]

No I was just a bit thrown by the "sometimes". I thought they were referring to a reliability issue. I am aware of the limited card support with ROCm and I complained about this elsewhere in the thread too.

Also I didn't accuse anyone of lying. No need to be so confrontational. And my remark to the original poster at the top was from before they clarified their post.

I just don't really see what AMD can do to make ollama work better other than porting ROCm to all their cards which is definitely something they should do.

And no I'm not an AMD fanboi. I have no loyalty to anyone, any company or any country.