Remix.run Logo
androiddrew 2 hours ago

Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.

sosodev 25 minutes ago | parent [-]

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.