Remix.run Logo
sleepybrett 4 hours ago

Yeah, i'm hoping that gets smoother. I've been experimenting with omlx and opencode on my m5x64gb and keep running into issues w/ Qwen3.6-35B-A3B-MLX-8bit exceeding it's memory limit at the most inopportune times. Playing with 12B gemma4 (8bit) more today.

Maybe I should be aiming for something targeting 48gb of memory?

manapause 3 hours ago | parent [-]

It depends what your goals are and what you are using it for. This space is fluid and my answer last week would be different than my answer today! That said there’s no substitute for hard work, here are some resource to get you up to up to speed:

https://carteakey.dev/blog/local-inference/local-llm-optimiz...

https://botmonster.com/ai/self-hosted-ai-agent-frameworks-20...

Personally I find myself swapping models depending if I am engaged in “trad-development” vs building agentic probes or apps involving imagery. Tailscale the LLM to your deployments and ta-da!