| ▲ | doctorpangloss 5 hours ago | ||||||||||||||||
deepseek v4 flash on mlx at 1m context runs at 20 t/s decode on a mac studio m3 ultra with 512gb of RAM | |||||||||||||||||
| ▲ | alfiedotwtf an hour ago | parent | next [-] | ||||||||||||||||
What is everyone running DeepSeek v4 Flash with?! It’s currently unsupported on Llama.cpp and vllm doesn’t support GPU+CPU MoE, so unless all of you have an array of DGX Sparks in your bedroom, what’s the secret sauce?! | |||||||||||||||||
| ▲ | dakolli 5 hours ago | parent | prev [-] | ||||||||||||||||
Just because you read it on a github repo doesn't make it true, it also doesn't take into account cpu temps and inevitable throttling you'll encounter. | |||||||||||||||||
| |||||||||||||||||