▲ | jpalawaga 6 days ago | |||||||||||||||||||||||||
you have outlier needs if an rtx, the fastest consumer grade card, is not good enough for you. the intel card is great for 1080p gaming. especially if you're just playing counterstrike, indie games, etc, you don't need a beast. very few people are trying to play 4k tombraider on ultra with high refresh rate. | ||||||||||||||||||||||||||
▲ | Sohcahtoa82 6 days ago | parent [-] | |||||||||||||||||||||||||
FWIW, my slowness is because of quantizing. I've been using Mistral 7B, and I can get 45 tokens/sec, which is PLENTY fast, but to save VRAM so I can game while doing inference (I run an IRC bot that allows people to talk to Mistral), I quantize to 8 bits, which then brings my inference speed down to ~8 tokens/sec. For gaming, I absolutely love this card. I can play Cyberpunk 2077 with all the graphics settings set to the maximum and get 120+ fps. Though when playing a much more graphically intense game like that, I certainly need to kill the bot to free up the VRAM. But I can play something simpler like League of Legends and have inference happening while I play with zero impact on game performance. I also have 128 GB of system RAM. I've thought about loading the model in both 8-bit and 16-bit into system RAM and just swap which one is in VRAM based on if I'm playing a game so that if I'm not playing something, the bot runs significantly faster. | ||||||||||||||||||||||||||
|