Remix clone Hacker News

new | show | ask | jobs Github

▲

cybertim 2 hours ago

$ nvidia-smi topo -p2p r

GPU0 GPU1

GPU0 X CNS

GPU1 CNS X

i guess not, i use llama.cpp with:

--spec-draft-n-max 3 --spec-type draft-mtp --split-mode tensor --tensor-split 1,1

and my (gen) tk/s are between 60-80 tk/s

will test this uncensored model and ngram added as well this weekend

btw, i also set my powerlimit to 220watt per card (with nvidia-smi) that will cost you around 1 tk/s but safe you a LOT of power and heat :)

▲

iMil 2 hours ago | parent [-]

CNS means Chipset not supported and I doubt it is the case, are you sure you are using the patched nvidia module? modinfo nvidia to check which one is loaded

	▲	cybertim 2 hours ago \| parent [-]
		I'm using bazzite on my ai-rig just because it has the gpu-optimized things setup (also nvidia-open). Looking at P2P seems to be available only for 90-versions of the nvidia rtx gpu line, not 80, and some versions of 50xx? (apparently the 5080?). Anyways, i downloaded that uncensored model and tweaked those kv settings etc. still getting 60-80tk/s but im able to get my context on 180224 now, used to be 131072 which gave me some trouble, this is already a win :)