| ▲ | burmanm 3 hours ago | |
It can run and the token generation is fast enough, but the prompt processing is so slow that it makes them next to useless. That is the case with my M3 Pro at least, compared to the RTX I have on my Windows machine. This is why I'm personally waiting for M5/M6 to finally have some decent prompt processing performance, it makes a huge difference in all the agentic tools. | ||
| ▲ | storus an hour ago | parent [-] | |
Just add a DGX Spark for token prefill and stream it to M3 using Exo. M5 Ultra should have about the same compute as DGX Spark for FP4 and you don't have to wait until Apple releases it. Also, a 128GB "appliance" like that is now "super cheap" given the RAM prices and this won't last long. | ||