Remix clone Hacker News

new | show | ask | jobs Github

	▲	androiddrew 2 hours ago
		Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.
	▲	sosodev 25 minutes ago \| parent [-]
		Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.