| ▲ | txtsd 8 hours ago |
| So I can use this in claude code with `ollama run claude`? |
|
| ▲ | Ladioss 7 hours ago | parent | next [-] |
| More like `ollama launch claude --model qwen3.6:latest` Also you need to check your context size, Ollama default to 4K if <24 Gb of VRAM and you need 64K minimum if you want claude to be able to at least lift a finger. |
| |
| ▲ | Patrick_Devine 5 hours ago | parent | next [-] | | If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`. | |
| ▲ | txtsd 4 hours ago | parent | prev [-] | | I only have 16GB VRAM, and my system uses ~4GB from that. What are my options? I got this one: `Qwen3.6-35B-A3B-UD-IQ2_XXS.gguf` |
|
|
| ▲ | nunodonato 6 hours ago | parent | prev | next [-] |
| https://sleepingrobots.com/dreams/stop-using-ollama/ |
|
| ▲ | pj_mukh 8 hours ago | parent | prev [-] |
| have you found a model that does this with usable speeds on an M2/M3? |
| |
| ▲ | postalcoder 8 hours ago | parent [-] | | On a M4 MBP ollama's qwen3.5:35b-a3b-coding-nvfp4 runs incredibly fast when in the claude/codex harness. M2/M3 should be similar. It's incomparably faster than any other model (i.e. it's actually usable without cope). Caching makes a huge difference. |
|