| ▲ | androiddrew 2 hours ago | |
Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup. | ||
| ▲ | sosodev 25 minutes ago | parent [-] | |
Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice. | ||