deepseek v4 flash on mlx at 1m context runs at 20 t/s decode on a mac studio m3 ultra with 512gb of RAM

alfiedotwtf an hour ago | parent | next [-]

What is everyone running DeepSeek v4 Flash with?!

It’s currently unsupported on Llama.cpp and vllm doesn’t support GPU+CPU MoE, so unless all of you have an array of DGX Sparks in your bedroom, what’s the secret sauce?!

▲

dakolli 5 hours ago | parent | prev [-]

Just because you read it on a github repo doesn't make it true, it also doesn't take into account cpu temps and inevitable throttling you'll encounter.

▲

doctorpangloss 4 hours ago | parent [-]

i ran it on my own device haha

i don't comprehend why people are in such disbelief at how much better this stuff runs on a mac studio than on NVIDIA hardware with 1/5th the VRAM. look, what can i say? NVIDIA is a bigger rip off than Apple is!

	▲	platevoltage 4 hours ago \| parent [-]
		Which is good, because Nvidia pulling a Micron and ceasing consumer hardware production is right around the corner.