| ▲ | LuxBennu 12 hours ago | |
Already running qwen 70b 4-bit on m2 max 96gb through llama.cpp and it's pretty solid for day to day stuff. The mlx switch is interesting because ollama was basically shelling out to llama.cpp on mac before, so native mlx should mean better memory handling on apple silicon. Curious to see how it compares on the bigger models vs the gguf path | ||
| ▲ | goldenarm 8 hours ago | parent | next [-] | |
How many tokens per second? | ||
| ▲ | zozbot234 8 hours ago | parent | prev [-] | |
They initially messed up this launch and overwrote some of the GGUF models in their library, making them non-downloadable on platforms other than Apple Silicon. Hopefully that gets fixed. | ||