Remix.run Logo
exe34 4 days ago

llama.cpp + quantized: https://huggingface.co/bartowski/Alibaba-NLP_Tongyi-DeepRese...

get the biggest one that will fit in your vram.

trebligdivad 4 days ago | parent | next [-]

How do people deal with all the different quantisations? Generally if I see an Unsloth I'm happy to try it locally; random other peoples...how do I know what I'm getting?

(If nothing else Tongyi are currently winning AI with cutest logo)

exe34 4 days ago | parent [-]

personally I've only used them for toying around - but in all cases you have to test them for your use case anyway.

davidsainez 4 days ago | parent | prev [-]

This is the way. I managed to run (super) tiny models on CPU only with this approach.