Remix clone Hacker News

new | show | ask | jobs Github

	▲	o11c 4 days ago
		It's useless to mention number of parameters without also mentioning quantization, and to a lesser-but-still-significant extent context size, which determine how much RAM is needed. "It will run" is a different thing than "it will run without swapping or otherwise hitting a slow storage access path". That makes a speed difference of multiple orders of magnitude. This is one thing Ollama is good for. Possibly the only thing, if you listen to some of its competitors. But the choice of runner does nothing to avoid the fact that all LLMs are just toys.