| ▲ | cybertim 2 hours ago | |||||||
$ nvidia-smi topo -p2p r GPU0 GPU1 GPU0 X CNS GPU1 CNS X i guess not, i use llama.cpp with: --spec-draft-n-max 3 --spec-type draft-mtp --split-mode tensor --tensor-split 1,1 and my (gen) tk/s are between 60-80 tk/s will test this uncensored model and ngram added as well this weekend btw, i also set my powerlimit to 220watt per card (with nvidia-smi) that will cost you around 1 tk/s but safe you a LOT of power and heat :) | ||||||||
| ▲ | iMil 2 hours ago | parent [-] | |||||||
CNS means Chipset not supported and I doubt it is the case, are you sure you are using the patched nvidia module? modinfo nvidia to check which one is loaded | ||||||||
| ||||||||