14-billion parameter model with 4-bit quantization seems rather small

I think these aren't meant to be representative of arbitrary userland-workload LLM inferences, but rather the kinds of tasks macOS might spin up a background LLM inference for. Like the Apple Intelligence stuff, or Photos auto-tagging, etc. You wouldn't want the OS to ever be spinning up a model that uses 98% of RAM, so Apple probably considers themselves to have at most 50% of RAM as working headroom for any such workloads.

▲

simlevesque 3 hours ago | parent | prev | next [-]

It's not much for a frontier AI but it can be a very useful specialized LLM.

▲

giancarlostoro 3 hours ago | parent | prev | next [-]

On my 24GB RAM M4 Pro MBP some models run very quickly through LM Studio to Zed, I was able to ask it to write some code. Course my fan starts spinning off like the worlds ending, but its still impressive what I can do 100% locally. I can't imagine on a more serious setup like the Mac Studio.

	▲	efxhoy 2 hours ago \| parent [-]
		How is the output quality of the smaller models?

▲

bilbo0s 3 hours ago | parent | prev | next [-]

It is.

That's how they make loot on their 128GB MacBook Pros. By kneecapping the cheap stuff. Don't think for a second that the specs weren't chosen so that professional developers would have to shell out the 8 grand for the legit machine. They're only gonna let us do the bare minimum on a MacBook Air.

▲

butILoveLife 3 hours ago | parent | prev [-]

For anyone who has been watching Apple since the iPod commercials, Apple really really has grey area in the honesty of their marketing.

And not even diehard Apple fanboys deny this.

I genuinely feel bad for people who fall for their marketing thinking they will run LLMs. Oh well, I got scammed on runescape as a child when someone said they could trim my armor... Everyone needs to learn.

	▲	giwook 2 hours ago \| parent \| next [-]
		I don't know that there would be a huge overlap between the people who would fall for this type of marketing and the people who want to run LLMs locally. There definitely are some who fit into this category, but if they're buying the latest and greatest on a whim then they've likely got money to burn and you probably don't need to feel bad for them. Reminds me of the saying: "A fool and his money are soon parted".
	▲	zitterbewegung 2 hours ago \| parent \| prev [-]
		Yesterday I ran qwen3.5:27b with an M1 Max and 64 GB of ram. I have even run Llama 70B when llama.cpp came out. These run sufficiently well but somewhat slow but compared to what the improvements with the M5 Max it will make it a much faster experience.