| ▲ | exe34 4 days ago | |||||||
llama.cpp + quantized: https://huggingface.co/bartowski/Alibaba-NLP_Tongyi-DeepRese... get the biggest one that will fit in your vram. | ||||||||
| ▲ | trebligdivad 4 days ago | parent | next [-] | |||||||
How do people deal with all the different quantisations? Generally if I see an Unsloth I'm happy to try it locally; random other peoples...how do I know what I'm getting? (If nothing else Tongyi are currently winning AI with cutest logo) | ||||||||
| ||||||||
| ▲ | davidsainez 4 days ago | parent | prev [-] | |||||||
This is the way. I managed to run (super) tiny models on CPU only with this approach. | ||||||||