Remix clone Hacker News

new | show | ask | jobs Github

	▲	lubitelpospat 17 hours ago
		If you're using litert-lm on a Mac with Apple Silicon - DO NOT forget to use "--backend gpu"! On my M1 Pro laptop this single setting resulted in 10x prefill performance and 2x decode performance. To anyone who knows how the internals of litert-lm work - what quantization does it use? How come the model is just 3.4 GB in size? EDIT: typo fix.