Remix.run Logo
lubitelpospat 17 hours ago

If you're using litert-lm on a Mac with Apple Silicon - DO NOT forget to use "--backend gpu"! On my M1 Pro laptop this single setting resulted in 10x prefill performance and 2x decode performance. To anyone who knows how the internals of litert-lm work - what quantization does it use? How come the model is just 3.4 GB in size?

EDIT: typo fix.